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Description 

[0001] Ordered arrays of oligonucleotides ("oligos") 
immobilized on a solid support have been proposed for 
sequencing DNA fragments. It has been recognized that 
hybridization of a cloned single-stranded DNA fragment 
to all possible oligo probes of a given length can identify 
the corresponding, complementary oligo segments that 
are present somewhere in the fragment, and that this 
information can sometimes can be used to determine 
the DNA sequence. Use of arrays can greatly facilitate 
the surveying of a DNA fragment's oligo segments. 
[0002] References disclosing arrays for sequencing 
include EP 0273 203 B1 (Southern); Khrapko et al A 
Method for DNA Sequencing by Hybridization with Oli- 
gonucleotide Matrix, DNA Sequence-J.DNA Sequenc- 
ing and Mapping, vol. 1, pp. 375-378 (1991) (Khrapko 
et al. 1991); and Khrapko et al., An oligonucleotide Hy- 
bridization Approach to DNA Sequencing, FEES Let- 
ters, vol. 256, no. 1.2, pp. 118-122 (October 1989) 
(Khrapko etal. 1989). Each of these references disclos- 
es hybridization arrays of all possible nucleotides of a 
given length, for sequencing. 

[0003] For short strands or even small genomes, re- 
alistic arrays of all possible nucleotides of a given length 
can be used for sequencing. The above-cited referenc- 
es disclose, for example, using 3-mers (4 3 = 64 areas) 
and 8-mers (4 8 = 65,536 areas). Southern acknowledg- 
es that the size of the matrix must be very big to se- 
quence genomes. For yeast Southern teaches that one 
would have to use 14-mers (2.6 x 10 8 areas). For the 
human genome, Southern teaches that one would have 
to use 1 8-mers (6.7 x 10 10 areas), an enormously huge 
and complex array. 

[0004] In an oligonucleotide array each oligo probe is 
immobilized on a solid support at a different predeter- 
mined position. The array allows one to simultaneously 
survey all the oligo segments in a DNA fragment strand. 
Many copies of the strand are required, of course. Ide- 
ally, surveying is carried out under conditions to ensure 
that only perfectly matched hybrids will form. Oligo seg- 
ments present in the strand can be identified by deter- 
mining those positions in the array where hybridization 
occurs. The nucleotide sequence of the DNA some- 
times can be ascertained by ordering the identified oligo 
segments in an overlapping fashion. For every identified 
oligo segment, there must be another oligo segment 
whose sequence overlaps it by all but one nucleotide. 
The entire sequence of the DNA strand can be repre- 
sented by a series of overlapping oligos, each of equal 
length, and each located one nucleotide further along 
the sequence. As long as every overlap is unique, all of 
the identified oligos can be assembled into a contiguous 
sequence block. 

[0005] There is an important limitation to sequence by 
known surveying techniques. As relatively longer DNA 
strands are surveyed, there is an increasing probability 
that more than two identified oligos will share the same 



overlapping sequence, i.e., the overlap is not unique. 
When this occurs, the sequence of the DNA cannot be 
unambiguously determined. Instead of one contiguous 
sequence block that contains the entire DNA sequence, 

5 the oligos can only can be assembled into a number of 
smaller sequence blocks, whose order is not known. 
[0006] Khrapko et al. 1991 and Khrapko et al. 1989 
disclose one means for obtaining additional information 
to gain additional information. Using, for example, an ar- 

10 ray of 8-mers in a first hybridization in which conditions 
are selected for octanucleotide hybrids but not shorter 
hybrids, ambiguities remain. Additional information to 
help resolve branching is obtained from a second hy- 
bridization to an array of 8-mers, this time adding to par- 
ts ticular areas not only the target strand but also a labeled 
5-mer selected from a library of 5-mers, in which condi- 
tions are selected for hexanucleotide hybrids but not 
shorter hybrids. Only labeled 5-mers hybridized on the 
target strand adjacent to the immobilized 8-mer/target 

20 hybrid will not be washed away. This technique provides 
a limited amount of additional sequence information that 
is adequate only for simple systems. For complex sys- 
tems, complex ambiguities remain. 
[0007] In accordance with a first aspect of the inven- 

25 tion there is provided a sectioned binary oligonucleotide 
array comprising an array of predetermined areas on a 
surface of a solid support, wherein said areas are phys- 
ically separated from one another into sections, such 
that nucleic acids in an aqueous solution generated in 

30 one section cannot migrate to another section, each ar- 
ea having therein, covalently linked to said surface, mul- 
tiple copies of a binary oligonucleotide of a predeter- 
mined sequence, said binary oligonucleotide consisting 
of a constant sequence of base-pairing nucleotides ad- 

35 jacent to a variable sequence of base-pairing nucle- 
otides and wherein the constant sequence is the same 
for all oligonucleotides in the array. 
[0008] In accordance with a second aspect of this in- 
vention there is provided also a method of sorting a mix- 

40 ture of nucleic acid strands comprising the steps of: a) 
providing a solution containing a mixture of nucleic acid 
strands in single-stranded form, b) providing a first bi- 
nary oligonucleotide array of predetermined areas on a 
surface of a solid support, each area having therein, 

45 covalently linked to said surface, copies of a binary oli- 
gonucleotide consisting of a constant sequence of base- 
pairing nucleotides adjacent to a variable sequence of 
base-pairing nucleotides, the constant nucleotide se- 
quence being the same for all oligonucleotides in the 

50 array, c) contacting said solution to said first binary oli- 
gonucleotide array, and d) hybridizing said nucleic acid 
strands to binary oligonucleotides in said array under 
condition sufficiently stringent to promote hybrids of the 
length of the immobilized oligonucleotides but not short- 

55 er hybrids. . Preferably the first binary oligonucleotide ar- 
ray is a sectioned array according to the first aspect of 
this invention. 

[0009] In accordance with a third aspect of the inven- 
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tion there is provided further a method for sorting termi- 
nally truncated partial copies of at least one nucleic acid 
strand by their variable termini utilizing an array of im- 
mobilized binary oligonucleotides having a constant se- 
quence of at least three nucleotides adjacent to a vari- 
able sequence of at least three nucleotides, comprising 
a) hybridizing to said immobilized oligonucleotides a 
masking oligonucleotide that is complementary either to 
said constant sequence or to a portion thereof that is 
adjacent to said variable sequence; b) hybridizing the 
partial copies to the array under conditions that promote 
formation of hybrids of the length of said variable se- 
quence but not shorter lengths; c) ligating the masking 
oligonucleotides to partial copies that have hybridized 
to said variable sequence by their variable termini; and 
d) increasing the stringency of the hybridization condi- 
tions to remove hybrids shorter than the combined 
length of the masking oligonucleotide and the variable 
sequence. 

[0010] In accordance with a fourth aspect of the in- 
vention there is provided also a method for sequencing 
the oligonucleotide content of a nucleic acid strand uti- 
lizing a comprehensive array of binary immobilized oli- 
gonucleotides containing all possible variable sequenc- 
es of a given length from three to eight nucleotides, com- 
prising a) preparing a complete set of terminally truncat- 
ed copies of said strand; b) terminally sorting said copies 
into groups having common variable ends utilizing a bi- 
nary array by the method of the third aspect of this in- 
vention; c) surveying the oligonucleotide content of each 
group by hybridizing the same to said comprehensive 
array; and d) detecting where hybridization to the array 
has occurred. 

[0011] In accordance with a fifth aspect of the inven- 
tion there is provided additionally a method for obtaining 
information to allocate sequenced and ordered frag- 
ments from sister chromosomes to chromosomal link- 
age groups, comprising a) preparing a restriction digest 
different from any digest used to sequence and order 
said fragments, thereby producing fragments that span 
the junctions between said ordered fragments; b) termi- 
nally sorting said fragments utilising a binary oligonucle- 
otide array according to the method of the second as- 
pect of the invention in which the nucleic acid strands 
have a common terminal restriction site that is comple- 
mentary to the constant sequence; c) preparing termi- 
nally truncated partial copies of the sorted fragments in 
individual wells of said binary oligonucleotide array by 
a method comprising: (i) hybridizing the strand to the 
array by an oligonucleotide segment contained in the 
strand, said array comprising predetermined areas on 
a surface of a solid support, each area having therein 
immobilized oligonucleotides consisting of a predeter- 
mined variable sequence, said hybridization taking 
place under conditions that promote the formation of hy- 
brids of the length of the immobilized oligonucleotide in 
each area but not shorter hybrids, and (ii) where the 
strand is hybridized to a 3' array, enzymatically extend- 



ing the immobilized oligonucleotide using the hybridized 
strand as a template, and where the strand is hybridized 
to a 5' array, hybridizing a primer to a priming region 
contained in the 3' terminus of the hybridized strand, 
5 then enzymatically extending the primer to form an ex- 
tension product and ligating the extension product to the 
immobilized oligonucleotide; d) hybridizing said partial 
copies to an array of all variable nucleotides of a given 
length; and 

10 e) detecting where hybridization has occurred on 

the latter array. 

[0012] Finally, there is provided in accordance with a 
sixth aspect of the invention a method for surveying ol- 
igonucleotides in a nucleic acid strand comprising a) 

15 randomly degrading the strand into pieces that are as 
short as possible but whose average length exceeds by 
at least one nucleotide the length of oligonucleotides to 
be surveyed by hybridization to variable sequences of 
binary oligonucleotides; b) ligating the pieces to a ligat- 

20 ing oligonucleotide complementary to at least to a por- 
tion of a constant sequence of immobilized oligonucle- 
otides in a binary array according to the first aspect of 
the invention; c) hybridizing the pieces to the binary ar- 
ray, said binary array having immobilized oligonucle- 

25 otides in an ordered array therein and consisting of a 
constant sequence adjacent to a variable sequence, the 
immobilized oligonucleotides in an individual area of the 
array having the same sequence; and d) detecting the 
hybrids formed. 

30 [0013] A "binary array" according to the invention con- 
tains immobilized oligos comprised of two sequence 
segments of predetermined length, one variable and the 
other constant. The constant segment is the same in 
every oligo of the array. The variable segments can vary 

35 both in sequence and length. Binary arrays have advan- 
tages compared with ordinary arrays: (1) they can be 
used to sort strands according to their terminal sequenc- 
es, so that each strand binds to a fixed location (an ad- 
dress) within the array; (2) longer oligos can be used on 

40 an array of a given size, thereby increasing the selec- 
tively of hybridization; this allows strands to be sorted 
according to the identity of internal oligo segments ad- 
jacent to a particular constant sequence (such as a seg- 
ment adjacent to a recognition site for a particular re- 

45 striction endonuclease), and this allows strands to be 
surveyed for the presence of signature oligos that con- 
tain a constant segment in addition to a variable seg- 
ment; (3) universal sequences, such as priming sites, 
can be introduced into the termini of sorted strands us- 

50 ing the binary arrays, thereby enabling the strands' spe- 
cific amplification without synthesizing primers specific 
for each strand, and without knowledge of each strand's 
terminal sequences; and (4) the specificity of hybridiza- 
tion during surveying can be increased by coupling hy- 

55 bridization to a ligation event that discriminates against 
terminal base-pair mismatches. 
[0014] A "sectioned array" as used herein is one di- 
vided into sections, so that every individual area is me- 
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chanically separated from all other areas, such as, for 
example, a depression on the surface, or a "well". The 
areas have different oligos immobilized thereon. A sec- 
tioned array allows many reactions to be performed si- 
multaneously, both on the surface of the solid support 
and in solution, without mixing the products of different 
reactions. The reactions occurring in different wells are 
highly specific due to the nucleotide sequence of the im- 
mobilized oligo. A large number of sortings and manip- 
ulations of nucleic acids can be carried out in parallel, 
by amplifying or modifying only those nucleic acids in 
each well that are perfectly hybridized to the immobi- 
lized oligos. Nucleic acids prepared on a sectioned array 
can be transferred to other arrays (replicated) by direct 
blotting of the wells' contents (printing), without mixing 
the contents of different wells of the same array. Fur- 
thermore, the presence of individual sections in arrays 
allows multiple re-hybridizations of bound nucleic acids 
to be performed, resulting in a significant increase in hy- 
bridization specificity. It is particularly advantageous ac- 
cording to this invention to use a binary array that is sec- 
tioned. 

[0015] The methods of the invention use sectioned ar- 
rays to sort mixtures of nucleic acid strands, either RNA 
or DNA. As used herein, "strand" means not just a single 
strand, but multiple copies thereof; and "mixture of 
strands" means a mixture of copies of different strands 
no matter how many copies of each are present. Simi- 
larly "fragment" refers to multiple copies thereof, and 
"mixture of fragments" means a mixture of copies of dif- 
ferent fragments. The methods include sorting strands 
either according to their terminal oligo segments ^'-ter- 
minal or 5-terminal), or according to their internal oligo 
segments on a binary array. Before or after sorting, uni- 
versal priming region(s) can be added to the strands' 
termini to enable amplification. Binary sectioned arrays 
for sorting according to strands' terminal sequences 
("terminal sequence sorting arrays") can be comprehen- 
sive. A "comprehensive array" is one wherein any pos- 
sible strand will hybridize to at least one immobilized ol- 
igo. This type of sorting is particularly useful for prepar- 
ing comprehensive libraries of fragments of a large ge- 
nome. For example, in one embodiment of the invention, 
strands of restriction fragments have their restriction 
sites restored and are sorted on a binary array. That ar- 
ray contains immobilized oligos whose constant seg- 
ments contain the sequence complementary to the re- 
striction site, and an adjacent variable segment. The ar- 
ray is complete, containing all variable sequences of 
each type in separate areas. 

[0016] The invention also includes using sectioned ar- 
rays for preparing every possible partial copy of a strand 
or a group of strands. The term "partial" refers to multiple 
copies thereof. Partials are prepared by either of the fol- 
lowing methods: (1) terminal sorting on a binary sec- 
tioned array of a mixture of all possible partial strands 
generated by random degradation of a parental strand; 
or (2) generation of partials directly on an array, through 



the sorting on an ordinary sectioned array of parental 
strands according to the identity of their internal oligo 
sequences, followed by the synthesis of partial copies 
of each parental strand by enzymatic extension of the 
5 immobilized oligos utilizing the hybridized parental 
strands as templates. In either case, generated partials 
correspond to a parental strand whose 3' or 5' end is 
truncated to all possible extents (at the "variable" end of 
the partial), and whose other end is preserved (at the 
"fixed" end of the partial). These are "one-sided par- 
tials." Unless otherwise indicated the word "partial" is 
used herein to refer to one-sided partials. 
[0017] The invention also includes methods of using 
oligo arrays to obtain oligo information as part of a proc- 
ess for determining the nucleotide sequence of a long 
nucleic acid strand, or of many nucleic acid strands in 
an unknown mixture. A complete set of one-sided par- 
tials of the strand or strands is prepared on a sectioned 
array, and the oligo content of the partial strands in each 
well of the array is separately surveyed (i.e. each group 
of partials sharing the same oligo at the partials' variable 
end is surveyed). 

[0018] The invention also includes methods of using 
oligo arrays for ordering previously sequenced frag- 
ments from a first restriction digest of a large nucleic 
acid or even a genome. 

[0019] The invention also includes methods of using 
oligo arrays for allocating sequenced and ordered allelic 
fragments into their chromosomal linkage groups. 
[0020] The invention also includes a method of using 
binary arrays for surveying the oligos contained in 
strands or their partials. This method provides improved 
comprehensive surveys over the conventional survey- 
ing of oligos on an ordinary array. 
[0021] The invention is described below in greater de- 
tail by way of example only with reference to the accom- 
panying drawings, in which 

Figure 1 shows a binary array; 
Figure 1a shows an oligo immobilized in an area of 
a binary array; 

Figure 2 shows a sectioned array having depres- 
sions; 

Figure 2a shows a well of a sectioned array; 
Figure 3 shows addition of a lattice to a support to 
make a sectioned array; 

Figure 4 shows an example of sorting and amplifi- 
cation of restriction fragments on a sectioned binary 
array; 

Figure 5 shows an example of preparing partials on 
a sectioned ordinary array; 
Figure 6 shows, schematically, the order of steps 
for sequencing a complete genome; 
Figure 7 shows, schematically, the use of a sheet 
with a number of miniature survey arrays for simul- 
taneous surveying every well in a partialing array; 
and 

Figures 8 to 11 show examples of the determination 
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of nucleotide sequences from indexed address sets 
obtained from analysis of mixtures of strands. 

I. Oligonucleotide arrays 

[0022] As used herein an "oligonucleotide array" is an 
array of regularly situated areas on a solid support 
wherein different oligos are immobilized, typically by 
covalent linkage. Each area contains a different oligo 
whose location is predetermined. 
[0023] Arrays can be classified by the composition of 
their immobilized oligos. "Ordinary arrays" contain oli- 
gos comprised entirely of "variable segments". Every 
position of the oligo sequence in such a segment can 
be occupied by any one of the four commonly occurring 
nucleotides. 

[0024] Comprehensive ordinary arrays are those 
wherein any segment of any possible strand will hybrid- 
ize perfectly to the length of one or more immobilized 
oligos so that no strand is lost. 

[0025] Binary arrays differ from ordinary arrays. A bi- 
nary array is illustrated in Figures 1 and 1a. Figure 1 
shows a substrate or support 1 having immobilized ther- 
eon an array of oligos 3, each oligo being in a separate 
area 2 of support 1. Figure 1a shows one area 2. A bi- 
nary oligo 3 (many copies, of course) comprised of con- 
stant region 5 and variable region 6 is covalently bound 
to support 1 by covalent linking moiety 4. 
[0026] Because of the constant segments, binary ar- 
rays provide means for the hybridization of longer se- 
quences without increasing the size of the array. The 
constant segment can be located within the immobilized 
oligo either "upstream" of the variable segment (i.e., to- 
ward or at the 5' end of the oligo) or "downstream" from 
the variable segment (i.e., toward or at the 3' end of the 
oligo). The type of array that is chosen depends on the 
specific application. The constant region preferably is or 
includes a good priming region for amplification of hy- 
bridized strands by a polymerase chain reaction (PCR), 
or a promoter for copying the strand by transcription. 
Generally a length of 1 5 to 25 nucleotides is suitable for 
priming. The constant region can contain all or part of 
the complement of a restriction site. A binary array can 
be "plain" or "sectioned" (see below). 
[0027] "Plain arrays" known in the art are arrays in 
which the . individual areas are not physically separated 
from one another. Reactions carried out simultaneously 
are limited to those in which the nucleic acid templates 
and the reaction products are bound in some manner to 
the surface of the array to avoid the intermixing of prod- 
ucts. 

[0028] "Sectioned arrays" are divided into sections, 
so that each area is physically separated by mechanical 
or other means (e.g., a gel) from all the other areas, e. 
g., depressions on the surface, called a "well". There 
are many techniques apparent to one skilled in the art 
for preventing the exchange of materials between are- 
as; any such method can be used to make a "sectioned" 



array, as that term is used herein, even though there 
might not be a physical wall between areas. 
[0029] One type of sectioned array is illustrated in Fig- 
ures 2 and 2a. Figure 2 shows a support sheet 60 having 
5 an array of depressions or wells 62, each containing 
many copies of an immobilized oligo 64. Figure 2a 
shows one well 62 of the array of Figure 2. Well 62 
formed in support 60 has therein oligo 64 covalently 
bound to support 60 by covalent linking moiety 66. In 
10 practice one may prepare a plain array, e.g., on a flat 
sheet, and then, at a point during a series of steps in- 
volving its use, convert the array into a sectioned array, 
e.g., by making physical depressions in a deformable 
solid support to isolate the individual areas. The sec- 
ts tioned array can also be created by applying a lattice to 
the solid support and bonding it to the surface so that 
each area is surrounded by impermeable walls. An ex- 
ploded perspective view of such a sectioned array is 
shown in Figure 3. Support or substrate 70, here a pla- 
20 nar sheet, has mounted thereon and affixed thereto a 
lattice 72 comprised of a series of horizontal members 
74, 76. The lattice members define a series of open ar- 
eas which, in conjunction with support 70, define an ar- 
ray of wells 78. In some applications it is preferable to 
25 utilize a detachable lattice (or a removable cover sheet), 
so that the sectioned array can be converted back to a 
plain array. 

[0030] sectioned arrays according to this invention 
can be used to increase the specificity of hybridization 

30 of nucleic acids to the immobilized oligos. After hybrid- 
ization, unhybridized strands can be washed away. Hy- 
bridized strands can then be released into solution with- 
out mixing. Released strands can be rebound to the im- 
mobilized oligos, and unhybridized strands can be 

35 washed away. Each successive release, rebinding, and 
washing increases the ratio of perfectly matched hybrids 
to mismatched hybrids. 

[0031] An array can be "3 ,B or "5 H \ "3 1 arrays" possess 
free 3' termini and "5' arrays" possess free 5' termini. 
40 The immobilized oligos in a 3' array can be extended at 
their 3' termini by incubation with a nucleic acid polymer- 
ase. If it is a template-directed polymerase, only immo- 
bilized oligos hybridized to a template strand can be ex- 
tended. 

45 [0032] Methods of oligodeoxyribonucleotide synthe- 
sis directly on a solid support are also known in the art, 
including methods wherein synthesis occurs in the 3' to 
5' direction (so that the oligos will possess free 5' termi- 
ni). Methods wherein synthesis occurs in the 5' to 3' di- 

50 rection (so that the oligos will possess free 3' termini) 
are also known. 

[0033] Suitable substrates or supports for arrays 
should be nonreactive with reagents to be used in 
processing, washable under stringent conditions, not in- 
55 terfere with hybridization and not be subject to inordi- 
nate non-specific binding. For example, treated glass 
polymers of various kinds (e.g., polyamide and polyac- 
romorpholide), latex-coated substrates and silica chips. 
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[0034] Arrays can be made over a wide range of sizes. 
In the example of a square sheet, the length of a side 
can vary from a few millimeters to several meters. 

II. Sorting nucleic acids 

[0035] The present invention allows mixtures of 
strands to be sorted according either to their terminal 
oligo segments ("terminal sorting") or their internal oligo 
segments ("internal sorting") on a binary array. 
[0036] There are two important aspects of our inven- 
tion for sorting. First, each strand in a mixture can be 
made to hybridize at only a few, or a single, location. 
And second, each strand can be provided with universal 
terminal priming regions that enable PCR amplication 
without prior knowledge of the terminal nucleotide se- 
quences and without the need to synthesize individual 
primers. 

[0037] For terminal sorting, the priming region(s) can 
be made essentially dissimilar from the sequences oc- 
curring in the nucleic acids that are present in the mix- 
ture to be sorted, so that priming does not occur any- 
where but at the strands' termini. When strands from a 
complete restriction digest of a DNA are to be terminally 
sorted and amplified, priming only at the strands termini 
can be promoted by restoring the terminal restriction 
sites (those sites having been eliminated from internal 
regions by complete digestion) concomitant with the 
generation of terminal priming regions. 
[0038] Terminal sorting is carried out on a binary ar- 
ray, which preferably is sectioned. The immobilized oli- 
gos contain a constant segment complementary to ei- 
ther the strands' 3' priming region or 5' priming region. 
Thus, each strand can only be hybridized to one location 
within the array. By sorting on a comprehensive array, 
every strand is bound somewhere within the array. This 
is especially important for the preparation of a compre- 
hensive library of fragments of a long nucleic acid or a 
genome. 

[0039] Strands can be sorted on either 3' or 5' arrays 
in which the constant segment is located either up- 
stream or downstream of the variable segment. High 
specificity of sorting can be achieved by employing 3' 
arrays in which the constant segment of the immobilized 
oligos is upstream. In that case, sorting can be followed 
by the generation of an immobilized copy of each sorted 
strand using the immobilized oligos as primers for the 
synthesis of a complementary copy of that strand when 
the array is incubated with an appropriate DNA polymer- 
ase. The generation of copies covalently linked to the 
array enables the array to be vigorously washed to re- 
move non-covalently bound material before strand am- 
plification. It also enables the arrays to serve as perma- 
nent banks of sorted strands which can subsequently 
be amplified over and over to generate copies for further 
use. 

[0040] A strand sorting procedure is shown in Figure 
4. A DNA sample 10 is completely digested with a re- 



striction endonuclease. The ends of each fragment are 
restored, and universal priming sequences 17 generat- 
ed in the process to prepare fragments 11 for sorting. It 
is not necessary that priming sequences be added at 
both ends, if only linear amplification is desired. Nor is 
it necessary that the priming sequence at the 3' end of 
a strand be the same as the priming sequence at the 5' 
end. 

[0041] The strands are then melted apart 12 and hy- 
bridized to a terminal sequence binary sorting array, 
whose immobilized oligos 14 contain a variable seg- 
ment 15 and a constant segment 16 which is comple- 
mentary to the universal priming region 1 7, including the 
restored recognition site of the restriction enzyme 16a, 
1 7a. Each strand is at a location dependent upon its var- 
iable sequence 100 adjacent to its priming sequence. 
At this point the array need not be sectioned. The array 
is then washed to remove unhybridized strands. The en- 
tire array is then incubated with DNA polymerase. Con- 
sequently, a complementary copy 18 of each hybridized 
DNA strand is generated by extension of the 3* end of 
the oligo to which the strand is bound. The array is then 
vigorously washed to remove the original DNA strands 
and all other material not covalently bound to the surface 
(not shown). 

[0042] The covalently bound copy strands can be am- 
plified. During amplification it is usually desirable that 
the array be sectioned. The wells are filled with a solu- 
tion containing universal primers 19, 20, an appropriate 
DNA polymerase, and the substrates and buffer needed 
to carry out PCR. The array can, if desired, be sealed 
with a coversheet, further isolating the wells from each 
other. PCR is carried out simultaneously in each well of 
the array. This results in sorting the mixture of strands 
into groups of strands that share the same terminal oligo 
sequence, each strand (or each group of strands) being 
present in a different well of the array and amplified 
there. 

[0043] The results of hybridization can be improved 
by "proof-reading", or editing, the hybrids formed, by se- 
lectively destroying those hybrids that contain mis- 
matches, without affecting perfect hybrids. 
[0044] The length of the immobilized oligos in a strand 
sorting array is chosen to suit the number of strands to 
be sorted. When sorting strands according to their ter- 
minal sequences, the number of different strands ob- 
tained in each well equals the number of times that a 
particular oligo complementary to the variable segment 
of the immobilized oligo occurs among the termini of dif- 
ferent strands in the mixture. If the number of nucle- 
otides in each variable segment is n, then the total 
number of such variable sequences is 4 n , and the mean 
number of different strands in a well is N/4 n , where N is 
the number of different strands in the mixture, provided 
that nucleotide sequence is random, and that each of 
the four nucleotides is present in equal proportion. If a 
random sequence that is the size of an entire diploid hu- 
man genome (6 x 10 9 basepairs) is completely digested 
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by a restriction endonuclease that has a hexameric rec- 
ognition site, then the resulting mixture will contain ap- 
proximately 3 x 10 6 strands with an average length of 
4,096 nucleotides. If this mixture is then applied to a 
comprehensive binary array having variable segments 5 
eight nucleotides long, then each well will contain, on 
average, approximately 45 different strands. 
[0045] The invention also includes methods for isolat- 
ing individual strands by sorting them according to the 
identity of their terminal sequences on sectioned binary 
arrays. The strands can be from restriction fragments or 
not, so long as unique priming sequences are added to 
at least one of the strand's termini, such as by methods 
described herein. If the number of different strands in a 
sample is rather small, there is a high probability that 
after the first stage of sorting, many wells will either not 
be occupied, or be occupied by only one type of frag- 
ment. In the case of a complex mixture of strands (such 
as from the digestion of an entire human genome), a 
number of different types of fragments will occupy each 
well. In that case, the isolation of individual fragments 
can be achieved by PCR amplifying the strands in each 
well in the first stage of sorting and then sorting the 
group of fragments from each well on a fresh sectioned 
array. After symmetric PCR amplification, each well of 
the first array will contain copies of the strands that were 
originally hybridized there, and also their complementa- 
ry copies. 

[0046] If the original strands were sorted by their 3' 
ends, then their copies in a given well will all possess 
the same 3'-terminal sequence, and their complemen- 
tary copies will possess the same 5' end. However, the 
3'-terminal sequences of the complementary copies of 
the original strands in each well will be different (as will 
be the 5' terminal sequences of the original copies). 
Therefore, the complementary strands will bind at dif- 
ferent locations within the new sectioned array, accord- 
ing to the identity of their own 3'-terminal sequences, 
and with a high probability, each of them will occupy a 
separate well, where they can then be amplified. 
[0047] Alternatively, the second stage of sorting can 
be carried out according to the identity of the terminal 
sequences at the other end of each strand. For example, 
if the strands were sorted in the first stage by their 3' 
ends (on an array whose immobilized oligos contain up- 
stream constant segments, then the groups of strands 
from each well in the first array can be sorted in a second 
stage by their 5' termini (on an array having downstream 
constant segments). In either procedure, as a result of 
the second round of sorting, almost all of the different 
types of fragments are separated from one another (with 
the exception of virtually identical allelic strands from a 
diploid genome, which usually have identical termini, 
and consequently are sorted into the same well). The 
isolated strands can then be used for any purpose. For 
example, they can be inserted into vectors and cloned 
or they can be amplified and their sequences deter- 
mined. 



[0048] The invention also includes the use of binary 
arrays for isolating selected strands by sorting accord- 
ing to the identity of terminal sequences. Strands can, 
for example, be selected that contain particular regions 
(such as genes) of special interest from a clinical view- 
point. After the relevant portion of a genome has been 
sequenced, an array can be made using only preselect- 
ed oligos whose variable segments uniquely match the 
terminal sequences of the strands of interest, i.e., they 
would be long enough to uniquely hybridize to the de- 
sired strands. 

[0049] The invention also encompasses methods that 
include sorting fragments according to their internal se- 
quences. When so sorting, strands may bind at more 
than one well. This type of sorting can be useful for a 
number of applications, such as the isolation of strands 
that contain particular internal sequence segments (uti- 
lizing a sectioned ordinary array), or the sorting of 
strands according to the identity of variable oligo seg- 
ments adjacent to internal restriction sites of a particular 
type (utilizing a sectioned binary array). The latter ap- 
proach is useful for ordering sequenced restriction frag- 
ments. The sorting of strands by their internal segments 
on a 3' sectioned ordinary array is useful for the gener- 
ation of partial strands by virtue of extension of the im- 
mobilized oligos. 

[0050] The invention includes the sorting, in particular 
for sequencing, of natural mixtures of RNA molecules, 
such as cellular RNAs. Establishing messenger RNA 
sequences is useful, not only for the identification and 
localization of genes in the genomic DNA, but also for 
providing information necessary to determine the cod- 
ing gene sequences (i.e. the exon/intron structure of 
each gene). Furthermore, the analysis of cellular RNAs 
in different tissues, at different stages of development, 
and in the course of a disease, will clarify which genes 
are active. Usually, RNAs are short enough to be sorted 
and analyzed without preliminary fragmentation. 



[0051] The present invention includes methods of us- 
ing sectioned arrays for preparing all possible partial 
copies of a strand or a group of strands. Preparing com- 
plete sets of partials of a strand(s), and sorting the par- 
tials by their variable ends is especially useful in a proc- 
ess for determining the sequence of the strand or 
strands. The preparation of partials is accomplished by 
either of the following methods: (1 ) terminally sorting on 
sectioned binary arrays a mixture of partial strands gen- 
erated by degradation of a "parental" strand(s) at ran- 
dom; or (2) generating partials on a sectioned ordinary 
array, through the sorting of a parental strand(s) accord- 
ing to the identity of the strand's internal sequences, fol- 
lowed by the synthesis of (complementary) partial cop- 
ies of the parental strand(s) by the enzymatic extension 
of the immobilized oligos, utilizing the hybridized paren- 
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tal strands as templates, and then copying the immobi- 
lized partials. By using comprehensive arrays, it is pos- 
sible to prepare every possible one-sided partial of a 
strand. 

[0052] In the first case (partialing before sorting), a 
strand, or a double-stranded fragment, or a group of ei- 
ther, carrying terminal priming regions, (these can be a 
strand or a group of strands sorted on a sectioned binary 
array as described above), is randomly degraded by a 
chemical or an enzymatic method, or by a combination 
of both. Then the mixture of partials is sorted on a sec- 
tioned binary array according to the identity of their new- 
ly generated termini, essentially as described above for 
the sorting of full-length strands by their terminal se- 
quences, with new priming sites being introduced at 
these new termini either before or after sorting. Only 
those partials that possess both the newly introduced 
priming site and the already existing priming site (at the 
opposite end), will be amplified by subsequent PCR. 
Partials can be sorted according to the identity of a var- 
iable sequence at either their 3' termini or their 5' termini. 
However, as is the case for the sorting of full-length 
strands, the highest specificity can be achieved by sort- 
ing according to the identity of a variable sequence at 
the 3' termini, and carrying out the sorting on 3' arrays 
having upstream constant segments, or by sorting ac- 
cording to the identity of a variable sequence at the 5' 
termini, and carrying out the sorting on 5' arrays having 
downstream constant segments. In these cases, sorting 
can be followed by the generation of immobilized (com- 
plementary) copies of the sorted partials. The arrays 
with the immobilized copies can serve as permanent 
banks of the sorted partials which can subsequently be 
amplified over and over to generate copies for further 
use. Following sorting, each well in the array will contain 
immobilized copies of all of those partials whose varia- 
ble end is complementary to the variable segment of the 
immobilized oligo. The other (fixed) end of these partials 
will be identical to one of the ends of the parental 
strands. If an oligo segment occurs more than once in 
a strand, or if it occurs in more than one strand in the 
group of strands subjected to partialing, then the well 
will contain a corresponding number of different partials, 
all sharing the same sequence at their variable ends. 
[0053] In the second case (sorting before partialing), 
partials are prepared directly from the parental strands 
that are hybridized to a sectioned ordinary array without 
prior degradation. A strand, or a mixture of strands, is 
hybridized to a 3' ordinary array. The immobilized oligos 
are then used as primers for copying the hybridized 
strands, beginning at the location within each bound 
strand where hybridization occurred, and ending at the 
upstream terminus of each bound strand. After exten- 
sion of the immobilized oligos, the hybridized parental 
strands are discarded. At this point the wells contain im- 
mobilized (complementary) partial strands. The partials 
in one well all share a 5'-terminal oligo segment that is 
complementary to a particular internal oligo in the pa- 



rental strand(s). The partial strands have 3'-terminal se- 
quences that include the complement of the 5'-terminal 
region of the parental strand(s) (which contains a prim- 
ing region). Unlike the methods described above for par- 

5 tialing before sorting, the immobilized complementary 
partials will contain a priming region at only one end and 
therefore can not be amplified exponentially. However, 
their linear amplification is possible, with the partials be- 
ing synthesized as DNAs or RNAs. Where RNA partials 

10 are generated, the priming region at the partial copy's 
3' terminus contains an RNA polymerase promoter. Syn- 
thesis of RNA copies is more efficient than linear syn- 
thesis of DNA copies. Alternatively, the synthesized 
copies can be provided with second priming regions and 

15 can then be amplified in an exponential manner by PCR. 
This approach is illustrated, schematically, in Figure 5. 
[0054] Figure 5 illustrates the generation of partials 
for one DNA parental strand 30 on a 3' sectioned ordi- 
nary array. First, the strand 30 (many copies, of course) 

20 such as obtained from well 13a of sorting array 13, is 
hybridized to the partialing array 31, a 3' sectioned or- 
dinary array, containing well 31a. The parental strand 
30 binds to many different locations within the array, de- 
pendent on which oligo segments are present in the 

25 strand. A hybrid 32 is formed in each well at the array 
that contains an immobilized oligo complementary to a 
strand's oligo segment. After hybridization, the entire ar- 
ray is washed and incubated with an appropriate DNA 
polymerase in order to extend the immobilized oligos uti- 

30 lizing the hybridized strand as a template. Each exten- 
sion product 33 strand is a partial (complementary) copy 
of the parental strand. Each partial begins at the place 
32 in the strand where hybridization occurred and ends 
at the strand's terminus. The strand preferably termi- 

35 nates at its 5' terminus with a universal priming se- 
quence 1 7, such as one introduced into all strands when 
sorting strands on a sectioned binary array as de- 
scribed. This allows for amplification of the partials. That 
priming sequence can contain a restored restriction site 

40 1 6a. The parental strand may also contain, if it was pre- 
viously sorted on a binary sorting array, a priming se- 
quence at its 3' terminus 1 7, adjacent to the variable se- 
quence 1 00 that the strand was previously sorted by. 
[0055] The entire array is then vigorously washed un- 

45 der conditions that remove the parental DNA strands 
and other material, preferably all, that is not covalently 
bound to the surface. The areas of the array then contain 
immobilized strands 33 that are complementary to a por- 
tion of the parental strand. The wells can then be filled 

50 with a solution containing the universal primer (or pro- 
moter complement), an appropriate polymerase, and 
the substrates and buffer needed to carry out multiple 
rounds of copying of the immobilized partial strands. 
The array can then be sealed, isolating the wells from 

55 each other, and (linear) copying can be carried out si- 
multaneously in all of the wells in the array. 
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IV. Surveying oligonucleotides with binary arrays 

[0056] The present invention includes using binary ar- 
rays to survey oligos contained in strands and partials. 
Binary arrays allow surveying to be improved as com- 5 
pared with ordinary arrays, and they allow new types of 
selective surveying (such as surveying "signature oligo- 
nucleotides"). 

[0057] In surveying, strands first can be randomly de- 
graded into pieces whose average length slightly ex- 10 
ceeds the surveyed length. After degradation, each re- 
sulting nucleic acid piece is ligated to the same type of 
oligo (i.e., a constant sequence), that preferably does 
not occur anywhere in the internal regions of the pieces. 
For example, the sequence of the added oligo can con- * 5 
tain the recognition site of a restriction endonuclease 
that was used to digest the DNA prior to fragment sort- 
ing. The ligation can be carried out in solution prior to 
hybridization, or after hybridization of the pieces to bi- 
nary immobilized oligos whose constant segment is 20 
complementary to the oligo to be ligated. Preferably, a 
3' array is used, having upstream constant segments. 
The immobilized oligos can then be extended with an 
appropriate DNA polymerase, using the hybridized nu- 
cleic acid pieces as templates. It is preferable that after 25 
extension all hybrids have the same length. This can be 
achieved by employing dideoxy-nucleotides as sub- 
strates for the polymerase, to restrict extension to one 
nucleotide. 

[0058] Hybrids can be labeled in both a ligation-de- 30 
pendent and an extension-dependent manner to in- 
crease the specificity of hybrid detection. Also, the ligat- 
ed oligos and the added dideoxy-nucleotides can be 
tagged with different labels, for example, fluorescent 
dyes of different colors. The array is then scanned at 35 
two different wavelengths, and only those areas that 
emit fluorescence of both colors indicate perfect hy- 
brids. 

[0059] Survey results can be improved further by hy- 
brid proof-reading, by destroying hybrids containing 40 
mismatches, and by using chemical or enzymatic meth- 
ods. 

V. Use of the oligonucleotide arrays for the sequencing 

of nucleic acids 45 

[0060] The arrays and methods of this invention can 
be used to determine the nucleotide sequence of nucleic 
acids, including the sequence of an entire genome, 
whether it is haploid or diploid. This embodiment re- 50 
quires neither cloning of fragments nor preliminary map- 
ping of chromosomes. It is especially significant that our 
method avoids cloning, a labor-intensive and time-con- 
suming approach that is essentially a random search for 
fragments. In a preferred embodiment a comprehensive 55 
collection of whole nucleic acids or fragments is sorted 
into discrete groups. The sorted nucleic acids are then 
amplified with a polymerase, preferably by PGR. 



[0061] Sequencing large diploid genomes, such as a 
human genome, using the arrays and methods of this 
invention is shown in Figure 6. We will describe the over- 
all method in general terms. In the embodiment illustrat- 
ed in Figure 6 an individual's genomic DNA 40 is digest- 
ed with a restriction endonuclease and sorted by termi- 
nal sequences into groups of strands using a 3' sec- 
tioned binary sorting array 1 3, as is described above in 
Section II and illustrated in Figure 4. 
[0062] Next, treating each well 1 3a of the sorting array 
separately, a complete set of partials is prepared for 
each group of sorted strands using a sectioned array 
31, as is described above in Section III and illustrated 
in Figure 5. The partials can be generated in any chosen 
manner to make them detectable. 
[0063] Then the contents of each well 31 a of the par- 
tialing array 31 is surveyed using a survey array 42, as 
is described above in Section IV. Preferably the survey 
array is a binary array, but an ordinary array may be 
used. In the embodiment shown in Figure 6, surveying 
is performed with a sheet 43 containing miniature survey 
arrays 42 that have been printed in a pattern that coin- 
cides with the number and location of the wells 31a. The 
oligo information obtained can be used, according to our 
invention, to separately determine the nucleotide se- 
quence of every strand in each group isolated on the 
sorting array. 

[0064] To determine the order of the fragments se- 
quenced as illustrated in the embodiment of Figure 6, 
genomic DNA 40 is digested with at least a second re- 
striction endonuclease and sorted into groups of strands 
using a 3' sectioned binary sorting array 44, as is de- 
scribed above in Section II and illustrated in Figure 4. 
The contents of each well 44a of the sorting array 44 is 
surveyed with special survey arrays 45, 46 that identify 
"signature oligonucleotides" (described below) in inter- 
site segments of sorted fragments from different di- 
gests. This is done to determine the order of the frag- 
ments relative to one another without regard to differ- 
ences between allelic pairs of fragments. In the embod- 
iment shown in Figure 6 this surveying is performed with 
printed sheets 47, 48 that have been printed with a pat- 
tern of miniature arrays 45, 46. 
[0065] To allocate the ordered allelic fragments to 
their respective chromosomes in a diploid organism, 
fragments are linked according to their allelic differenc- 
es. In the embodiment illustrated in Figure 6, the strands 
from selected wells of the sorting array 44 are trans- 
ferred to a selected well of one of a series of partialing 
arrays 49, partials are generated, and the partials are 
surveyed using miniature survey arrays 50 on printed 
sheets 51 . Only the presence of oligos containing allelic 
differences in the selected partials needs to be deter- 
mined to link a pair of allelic fragments to their respective 
neighboring allelic fragments. 

[0066] When sorting according to the identity of ter- 
minal sequences, each strand occupies a particular "ad- 
dress" in the array. It is convenient to think of the address 
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as the oligo sequence within a strand that directs the 
DNA strand to hybridize to a particular location, i.e., the 
sequence that is perfectly complementary to the varia- 
ble sequence of the oligo immobilized at that location. 
The "address" also identifies the location within the ar- 5 
ray where the DNA binds. 

[0067] After sorting, each group of strands is amplified 
and subjected to partialing. Importantly, the isolation of 
individual strands is not necessary, because our method 
allows the nucleotide sequence of each strand in a mix- 10 
ture to be determined. In particular, our method allows 
the sequences of strands in a well of the sorting array 
to be determined, separately from mixtures of strands 
in other wells. In a preferred embodiment, the partialing 
array is comprehensive in order to obtain all possible is 
one-sided partials (i.e., a comprehensive array). Each 
group of partials is amplified prior to surveying. Most 
preferably, the amplification is carried out in such a man- 
ner that one of the two complementary partial strands is 
produced in great excess over the other. 2 o 
[0068] Each group of partials is surveyed to identify 
their constituent oligos. Surveying is preferably carried 
out using binary arrays. 

[0069] Although not necessary, it is preferable to have 
the survey arrays be as compact as possible. It is antic- 25 
ipated that surveying will be advantageously accom- 
plished simultaneously for many or all wells of a partial- 
ing array by utilizing a sheet on which miniature survey 
arrays have been "printed" in a pattern that coincides 
with the arrangement of wells in the partialing array, in 30 
a manner similar to that shown in Figures 6 and 7. Re- 
ferring to Figure 7, partialing array 31, comprising an 
array of wells 31a, is surveyed using sheet 43, having 
printed thereon an array of miniaturized survey arrays 
42. The pattern of arrays 42 corresponds to the pattern 35 
of wells 31a, whereby all wells 31a can be surveyed si- 
multaneously. 

[0070] Automated photolithography techniques for 
preparing miniature oligo arrays have been developed 
[Fodor, S. P., Read, J. L, Pirrung, M. C, Stryer, L, Lu, 40 
A. T. and Solas, D. (1991). Light-Directed, Spatially Ad- 
dressable Parallel Chemical Synthesis, Science 251, 
767-773]. The manufacture of miniature arrays on a 
"chip", for use in surveys also has been reported. 
[0071] Surveying with comprehensive arrays produc- 45 
es a complete list of oligos contained in the partials in 
each well of the partialing array. This will reveal all oligos 
present in all partials in that well. The method of this 
invention can determine the sequences of the original 
(parental) fragment strands. 50 
[0072] The "partials" referred to in this section are 
one-sided partial strands that begin at the 5' terminus of 
a parental nucleic acid strand (the fixed end) and end at 
different nucleotide positions in the strand (the variable 
end). Partials are sorted in the partialing array according 55 
to the identity of their variable ends, and therefore each 
partial has a particular "address" within the array. As 
with sorting arrays, an "address" in a partialing array is 
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the oligo sequence that is present at the variable end of 
the partial strand and that is complementary to the var- 
iable segment of an immobilized oligo. The "address" 
also relates to the location within the array where the 
partial strand is found , since the variable segment of the 
oligo immobilized in that well is complementary to the 
oligo at the partial's variable terminus. The "address" al- 
so relates to the location within the parental strand of a 
partial's terminal oligo. The location of this "address ol- 
igo" within a parental strand is characterized by an "up- 
stream subset" of oligos that come before it in the pa- 
rental sequence and by a "downstream subset" of oligos 
that come after it. 

[0073] The method of establishing nucleic acid se- 
quences, for either a single strand or a group of parental 
strands sorted by their terminal sequences, begins by 
assembling an "address set" for each address in the par- 
tialing array. The "address set" is a comprehensive list 
of all oligos in all the parental strands which have the 
address oligo within their nucleotide sequences. The 
"upstream subset" contains all the oligos that occur up- 
stream (i.e., towards the 5' end) of the address oligo in 
parental strands that contain the address oligo. The 
"downstream subset" contains all the oligos that occur 
downstream (i.e., towards the 3' end) of the address ol- 
igo in any parental strands that contain the address ol- 
igo. Together the two subsets form the "address set." 
[0074] The upstream subset of each address can be 
determined directly from the survey of each well of a par- 
tialing array and consists of a list of all the oligos iden- 
tified as being present in the partial strands in that well. 
The downstream subset of each address can be inferred 
by examining the upstream subsets of all the addresses: 
the downstream subset of a particular address consists 
of those addresses whose own upstream subset in- 
cludes that particular address oligo. 
[0075] The upstream subset and the downstream 
subset of a particular address, taken together, are an 
"indexed address set". If an oligo occurs more than once 
in a strand, it can occur in both the upstream and the 
downstream subsets of an address. Indexed address 
sets provide the information required to order the oligos 
contained in a strand set, as will be described below. 
When a mixture of strands is examined, it is also useful 
to consider an address set without regard to which oli- 
gos occur upstream and downstream of an address. 
This is called an "unindexed address set". Unindexed 
address sets are decomposable into strand sets by the 
method of this invention. 

[0076] When assembling big strand sets whose oligos 
do not all overlap uniquely, it is advantageous to work 
with "sequence blocks" rather than with individual oli- 
gos. Sequence blocks are composed of oligos that 
uniquely overlap one another in a given strand set. Two 
oligos contained in a strand set are said to overlap if 
they share a terminal (5' or 3') r?-1 nucleotide sequence. 
An overlap is unique if no other oligo than those two in 
the strand set has this sequence at its termini. Here n is 



11 



19 



EP 0 675 966 B1 



20 



the length (in nucleotides) of each of the two oligos if 
they are of the same length or, if they are of different 
length, n is the length of the shorter one. We use unique 
overlaps to construct sequence blocks from the oligos 
in a strand set. 5 
[0077] The position of each sequence block relative 
to the others is determined from the distribution of the 
oligos between the upstream and downstream subsets 
of every address. This is accomplished by finding, for 
each of the blocks, which blocks occur upstream, and 10 
which blocks occur downstream, of that block by exam- 
ining the address sets. The address sets are used in 
order to generate "block sets." The block sets are ad- 
dress sets wherein blocks have been substituted for the 
oligos that comprise the blocks, including the address 15 
oligo. Once the relative position of the sequence blocks 
has been determined, they can be assembled into the 
final sequence. The assembly is governed by the follow- 
ing rules: (1) each of the blocks must be used at least 
once, (2) the'blocks must be assembled into a single se- 20 
quence, (3) the ends of neighboring blocks must match 
each other (i.e., overlap by an r?-1 nucleotide sequence, 
see above) and (4) the order of the blocks must be con- 
sistent with their positions relative to one another, as as- 
certained from the block sets, as will be clear from the 25 
examples. 

[0078] A sequence block can occur either once in a 
sequence, or more than once, and this we determine by 
examining the block sets. If a block occurs more than 
once in a sequence, it will always be contained in both 30 
its own upstream and downstream subsets. On the oth- 
er hand, if a block occurs only once in a sequence, it 
may or may not be present in its own upstream or down- 
stream subset. But, if a block is absent from either its 
upstream subset, or from its downstream set, that block 35 
occurs in the strand only once. The relative order of 
these "unique" blocks can be determined by noting 
which of them occur in the upstream subset, and which 
of them occur in the downstream subset, of the others. 
Once the unique blocks have been ordered relative to 40 
each other, the gaps between them are filled with blocks 
that may be non-unique. However, not every gap can 
necessarily be filled in with a particular block. There is 
a range of locations within which each non-unique block 
(or presumably-non-unique block) can be present. The 45 
range for a particular block is determined by noting 
those blocks that always occur upstream of it, and those 
blocks that always occur downstream of it. A gap can 
be filled in if, and only if, there is a block or a combination 
of blocks, whose outer ends have n-1 nucleotide-long 50 
perfect sequence overlaps with the ends of the blocks 
that form the gap. Because at least two overlaps, each 
of low probability, must occur simultaneously, it is highly 
unlikely that more than one block, or one combination 
of blocks, can fill a gap. If a particular block occurs many 55 
times in a strand, it will have to be used to fill every gap 
it matches. This is why, using the method of the inven- 
tion, it is possible to establish the sequence of a strand 



without measuring how many times an oligo occurs in 
the partials. It is only necessary to determine whether 
an oligo is present or not. 

[0079] An important aspect of this invention is the abil- 
ity to sequence a mixture of strands simultaneously. The 
invention can be used for the determination of fragment 
sequences from an entire fragmented and sorted ge- 
nome. 

[0080] If one strand is being sequenced, all address 
sets determined from a partialing array will contain the 
same oligos that constitute the strand set. The only dif- 
ference is that some oligos which are downstream in 
one set may be upstream in another address set. If a 
mixture of strands have been partialed on a single par- 
tialing array, certain addresses will be shared by more 
than one parental strand. Their address sets will be 
composite, containing all of the oligos from all of the 
strands that the address oligo is present in. Addresses 
that are only found in a particular strand in the mixture, 
however, will have address sets which only contain oli- 
gos from that strand. They are identical to the strand set, 
and each contain the same oligos. The mixture can con- 
tain up to a hundred or so different DNA strands, each 
of a different length and sequence, as can be obtained 
with an appropriate sorting array (or set of sorting ar- 
rays) and method described above. When a mixture of 
strands is analyzed on a partialing array, the data ob- 
tained by surveying the partials will reflect the diversity 
of the sequences in the mixture, and will appear to be 
very complex. However, we have discovered a way to 
decompose the unindexed address sets obtained by 
analysis of a strand mixture into their constituent strand 
sets. Then, as we have described for sequencing a sin- 
gle strand, the oligos in each of the identified strand sets 
can be grouped into sequence blocks that can be or- 
dered from the information contained in the indexed ad- 
dress sets, as will be clear from the examples. 
[0081] Unindexed address sets can be either "prime" 
or "composite." A prime set consists of one strand set; 
while a composite set consists of more than one. A 
prime set cannot be decomposed into other address 
sets, i.e., there is no address set which is a subset of a 
prime set. Composite sets, however, can usually be de- 
composed into two or more simpler address sets. Once 
individual strand sets have been identified, they can 
each be treated as though they were obtained from an 
analysis of a homogeneous strand. It is thus possible, 
in many cases, to sequence all strands in an unknown 
heterogenous DNA sample without first isolating the 
strands. 

[0082] The fragment sequences obtained by the 
methods outlined above or by any other method can 
then be put in their correct order using oligo arrays. As- 
sembling restriction fragments into contiguous se- 
quences can be accomplished by identifying each frag- 
ment's immediate neighbors. One method for obtaining 
this information is to use another restriction enzyme to 
cleave the same DNA at different positions, thus pro- 
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ducing a set of fragments that partially overlap neigh- 
boring fragments from the first digest, and then to se- 
quence these fragments. However, it is not necessary 
to sequence the fragments in the second restriction di- 
gest. It is only necessary to uniquely identify overlapping 5 
segments in the fragments from alternate restriction di- 
gests. This can be done by surveying "signatures". 
[0083] Signatures can be determined by hybridization 
of fragment strands to complementary oligo probes. A 
signature of a fragment may consist of one, two or more 10 
oligos, so long as it is unique within the sequence ana- 
lyzed. Neighboring fragments from one restriction digest 
can be determined by looking for their signatures in 
overlapping fragments from an alternate digest. 
[0084] The invention includes a method for identifying 15 
neighboring restriction fragments among the list of se- 
quenced fragments that does not require either cloning 
or sequencing of overlapping fragments. If strands from 
an alternate digest are sorted, complementary strands 
of the same fragment will hybridize to different address- 20 
es in the sorting array. Whenever intersite segments 
from two or more fragments of the first d igest are present 
within one fragment of the second digest, then all of 
these segments will be represented in both complemen- 
tary strands of that one fragment, and all will be present 25 
wherever those strands bind in a sorting array. The seg- 
ments are identified by obtaining their signatures 
through hybridization to specialized binary survey ar- 
rays. The signatures of intersite segments that occur in 
one fragment always accompany each other, whereas 30 
signatures of distant segments travel independently. 
[0085] After the fragments from an original (first) re- 
striction digest of a long DN A have been sequenced , the 
same DNA is digested with a second (different) restric- 
tion endonuclease, the termini of the generated frag- 35 
ments are provided with universal priming regions (that 
also restore the recognition sites at the termini), and the 
strands are sorted according to particular internal se- 
quences, namely, a variable sequence adjacent to the 
recognition site for the first restriction enzyme. The sort- 
ing array is a sectioned binary array. It contains immo- 
bilized oligos having a variable sequence as well as an 
adjacent constant sequence that is complementary to 
the recognition sequence of the first restriction endonu- 
clease. The sorted strands are amplified by "symmetric" 
PCR, so that in each well where a strand has been 
bound, copies of the bound strand, as well as comple- 
ments, are generated. In another embodiment, strands 
can be sorted according to their terminal sequences on 
an array whose oligos' constant segments include se- 50 
quences that are complementary to the recognition site 
of the second restriction enzyme. This alternative is not 
detailed, but it corresponds to the embodiment dis- 
cussed below, but with terminal sorting. 
[0086] Each strand that hybridizes to the binary sort- 55 
ing array will possess at least two recognition sites for 
the second restriction enzyme (restored at the strand's 
termini), and at least one (internal) recognition site for 
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the first restriction enzyme. The segments included be- 
tween these two types of restriction sites (intersite seg- 
ments) comprise the overlaps between the two types of 
restriction fragments, and each intersite segment is thus 
bounded by any two restriction sites of the two types. It 
follows, that each of these segments can be character- 
ized by identifying these two restriction sites and varia- 
ble sequences of preselected length within the segment 
that are immediately adjacent to each of the restriction 
sites. The combination of a recognition site (for either 
the first or the second restriction enzyme) and its adja- 
cent variable oligo we call a "signature oligonucleotide". 
Every intersite segment can be characterized by two 
signature oligos (of either type) that bound that seg- 
ment. The combination of the two signature oligos is de- 
fined herein as the intersite segment's "signature". 
[0087] After strand amplification, the strands in the 
wells of the sorting array are surveyed to identify the 
signature oligos of each of the two types. This is carried 
out by using two types of binary survey arrays. The first 
has immobilized oligos containing a variable oligo seg- 
ment and a constant segment that is, or includes, an 
adjacent sequence that is complementary to the recog- 
nition site for the first restriction endonuclease. The im- 
mobilized oligos in the second survey array has a vari- 
able oligo segment of preferably the same length as the 
variable segment of the first specialized survey array, 
and a constant segment that is, or includes an adjacent 
sequence that is complementary to the recognition site 
for the second restriction endonuclease. The constant 
oligo segments in these arrays can be located either up- 
stream or downstream of the variable oligo segments, 
resulting in the surveying of either the downstream or 
the upstream signature oligos in each strand of the in- 
tersite segments being surveyed. In a preferred embod- 
iment the constant oligo segments are upstream, and 
the immobilized oligos have free 3' ends, so that they 
can be extended by incubation with a DNA polymerase. 
From the oligo information that is obtained, the se- 
quenced fragments can be ordered relative to one an- 
other. 

[0088] In the method of the invention, the uniqueness 
of a signature is achieved by surveying "half signatures" 
(signature oligonucleotides) on two relatively small sur- 
vey arrays. If the variable segments in the arrays are 
8-nucleotide-long, the number of areas in the two arrays 
is approximately 130,000, or approximately 
100,000,000 times smaller than the single array that 
would be needed for detecting the same size signature 
(28 nucleotides). 

[0089] If a diploid genome (such as a human genome) 
is sequenced, the ordered fragments will appear as a 
string of unlinked pairs of allelic fragments. What re- 
mains unknown is how the allelic fragments in each pair 
are distributed between the homologous (sister) chro- 
mosomes that came from each parent. Allocation of the 
allelic fragments to these "chromosomal linkage 
groups" requires knowledge of which fragment in each 
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pair is linked to which fragment in a neighboring pair. 
[0090] The invention includes also a method that uses 
arrays for allocating allelic fragments to chromosomes, 
irrespective of what method was used for sequencing 
and ordering the fragments. The linkage of fragments in 5 
neighboring pairs can be achieved by sequencing a re- 
striction fragment ("spanning fragment") from an alter- 
nate digest that spans at least one allelic difference in 
each pair. Since the sequences of the allelic fragments 
are known, there is no need to sequence the spanning 
fragment. Instead, one can simply determine which oli- 
gos that harbor allelic differences accompany one an- 
other in the spanning fragment, i.e., which oligos occur 
in the same chromosome. This can be accomplished by 
surveying, at a selected address in a partialing array, 
partials generated from a selected group of restriction 
fragments from an alternate digest. A group of restriction 
fragments is selected that contains a spanning frag- 
ment, and an address in a partialing array is selected 
that encompasses a difference in one of the neighboring 
allelic pairs. 

[0091] Since the sequence of every fragment is 
known, it is possible to choose an alternate restriction 
fragment that spans the allelic differences in the neigh- 
boring pairs. A spanning restriction fragment, in fact, 
may already be present at a particular address in one 
of the sorting arrays used to sort alternate digests during 
the ordering procedure. 

[0092] In this method, sorted strands are melted 
apart, and the mixture is hybridized to a particular well 
in the partialing array, whose address corresponds to 
one of the allelic oligos. Two different wells are selected, 
each with an address that corresponds to an oligo that 
harbors a differenct allelic oligonucleotide After amplifi- 
cation of the partial strands, the oligos in the two wells 
are identified with a survey array. Examination tells 
which fragments are on the same chromosome. 1. 
[0093] Since allelic differences occur roughly once 
every 1 ,000 basepairs in the human genome, most al- 
lelic fragments resulting from digestion with a restriction 
enzyme recognizing a hexameric sequence (resulting in 
about 4,096 average length) will differ from each other. 
If the variable oligo segments in the survey arrays are 
made of octanucleotides, then each allelic nucleotide 
substitution will give rise to eight different oligos in each 
of the allelic fragments. However, using our method, in- 
spection of only one address in the partialing array is 
sufficient to reveal the linkage of the corresponding ref- 
erence oligo to any one of the eight oligos that encom- 
pass the nucleotide substitution that occurs in the neigh- 
boring fragment on the same chromosome. Therefore, 
only one address in the partialing array is needed to re- 
veal the linkages between two neighboring allelic pairs. 
Thus, 65,536 linkages can be determined on a single 
comprehensive partialing array made of variable octa- 
nucleotides. With this method, only 10 to 20 of these 
arrays would be needed to complete the assembly of an 
entire diploid human genome that has been fragmented 



by a restriction endonuclease with a hexameric recog- 
nition site. 

[0094] Computational methods can be developed to 
minimize or eliminate errors that occur during partialing 
and surveying, by taking advantage of the high redun- 
dancy in the data. Such methods should take into ac- 
count the following aspects of a preferred sequencing 
procedure: the sequence of every fragment is independ- 
ently determined four times (by virtue of each strand and 
its complement being present at two different addresses 
in the sorting array); each strand set is determined in as 
many trials as the number of different oligos in that 
strand; every nucleotide in a strand is represented by 
as many different oligos as the length (of the variable 
segment) of the immobilized oligos in the survey array; 
the locations where a particular block can occur in a se- 
quence are limited by the distribution of the blocks 
among the upstream and downstream subsets of each 
pertinent address; and the edges of a block must be 
compatible with the edges of each gap where that block 
is inserted. 

[0095] Using our genome sequencing method, one 
can use throughout essentially the same technology, i. 
e., hybridization of oligo probes and the amplification of 
nucleic acids by the polymerase chain reaction, both of 
which are well-studied, common laboratory techniques. 
The entire procedure can be performed by a specially 
designed machine, resulting in huge reductions in time 
and cost, and a marked improvement in the reliability of 
the data. Many arrays could be processed simultane- 
ously on such a machine. The machine most preferably 
should be entirely computer-controlled, and the compu- 
ter should constantly analyze intermediate results. As 
stated above, used arrays can be stored, both to serve 
as a permanent record of the results, and to provide ad- 
ditional material for subsequent analysis or for manipu- 
lating the sequenced strands and partials. 
[0096] Analysis of an individual's genomic DNA pro- 
vides the complete nucleotide sequence of that individ- 
ual's diploid genome. The genes and their control ele- 
ments are allocated into chromosomal linkage groups 
as they appear in a single living organism. The se- 
quence will describe an intact, functioning ensemble of 
genetic elements. This complete sequencing provides 
the ability to compare genomes of individuals, thereby 
enabling biologists to understand how genes function 
together and to determine the basis of health and dis- 
ease. The genomes of any species, whether haploid or 
diploid, can be sequenced. 

[0097] The invention can be used not only for DNA's 
but as well for sequencing mixtures of cellular RNAs. 
[0098] The invention is also useful to determine se- 
quences in a clinical setting, such as for diagnosis of 
genetic conditions. 
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VI. Examples 

1 . Sorting nucleic acids or their fragments on a binary 
oligonucleotide array whose immobilized oligos have 
free 3' termini, with constant upstream segments — 

[0099] This method allows the immobilized oligos to 
serve as primers for copying bound strands, resulting in 
the formation of complementary copies covalently 
linked to the array. 

1.1. Sorting restriction fragments according to their 
terminal sequences, following the introduction of 
terminal priming regions ~ 

[0100] DNA is digested using a restriction endonucle- 
ase. Recognition sites for the restriction endonuclease 
are restored in solution by introducing terminal exten- 
sions (adaptors) that contain a sequence which, togeth- 
er with the restored restriction site, form a universal 
priming region at the 3' terminus of every strand in the 
digest. This priming region is later used for amplification 
by PCR. After melting fragments, the strands are sorted 
on a sectioned binary array. A sequence complementary 
to the generated priming region serves as both the con- 
stant segment of the immobilized oligos and as the prim- 
er for PCR amplification of the bound strands. 
[0101] DNA to be analyzed is first digested substan- 
tially completely with a chosen restriction endonucle- 
ase, and the fragments obtained are then ligated to syn- 
thetic double-stranded oligo adaptors. The adaptors 
have one end that is compatible with the fragment ter- 
mini. The other end is not compatible with the fragments' 
termini. The adaptors can therefore be ligated to the 
fragments in only one orientation. The adaptors' strands 
are non-phosphorylated, which prevents their self-liga- 
tion. The strands in the restriction fragments have their 
5' termini phosphorylated which results from their cleav- 
age by a restriction endonuclease. This favors the liga- 
tion of the adaptors by a DNA ligase (such as the DNA 
ligase of T4 bacteriophage) to the restriction fragments, 
rather then to each other. Since DNA ligase catalyzes 
the formation of a phosphodiester bond between adja- 
cent 3' hydroxyl and phosphorylated 5' termini in a dou- 
ble-stranded DNA, the phosphorylated 5' termini of the 
fragments are ligated to the adaptor strand whose 3' end 
is at the compatible side of the adaptor. The 3' termini 
of the fragments remain unligated. A DNA polymerase 
possessing a 5'-3' exonuclease activity (such as DNA 
polymerase I from Escherichia coli or Taq DNA polymer- 
ase from Thermus aquaticus) is then used to extend the 
3' ends of the fragments, utilizing the ligated oligo as a 
template, concomitant with displacement of the unligat- 
ed oligo. To make the ligated oligo resistant to the 5'-3' 
exonuclease, the ligated oligo can be synthesized from 
a-phosphorothioate precursors. 
[0102] Although the oligo adaptors are provided in 
great excess during the ligation step, there is still a low 



probability that two restriction fragments will ligate to 
one another, rather then to the adaptor. To prevent this, 
the ligation products can again be treated with the re- 
striction endonuclease used to generate the fragments, 

5 in order to cleave the formed interfragment dimers. The 
endonuclease will not cleave the ligated adaptors if they 
are synthesized from modified precursors (such as nu- 
cleotides containing N 6 -methyl-deoxyadenosine), 
which are known and currently commercially available 

10 [e.g., from Pharmacia LKB]. Resistance of the ligated 
adaptors to digestion by the restriction endonuclease 
can be increased further if the ligated oligo is synthe- 
sized from phosphorothioates, and if phosphorothioate 
analogs of the nucleoside triphosphates are used as 

15 substrates for extension of the 3' termini. 

[01 03] After the priming regions have been added, the 
complementary strands are melted apart, such as by in- 
creasing temperature and/or by introducing denaturing 
agents such as guanidine isothiocyanate, urea, or for- 

20 mamide. The resulting strands are hybridized to a binary 
sorting array, such as by following a standard protocol 
for the hybridization of DNA to immobilized oligos. Hy- 
bridization is performed so that formation of only per- 
fectly matched hybrids is promoted. The hybrids have a 

25 length which is equal to that of the immobilized oligos. 
The immobilized oligos are attached to the array at their 
5' termini and contain constant restriction site segments 
adjacent to a variable segment of predetermined length. 
Each strand will be bound to the array at its 3' terminus. 

30 its location within the array will be determined by the 
identity of the oligo segment that is located in the strand 
immediately upstream from the restored restriction site 
at its 3' end, and that is complementary to the variable 
segment of the immobilized oligo to which it is bound. 

35 After hybridization and washing away all unbound ma- 
terial, the entire array is incubated with a DNA polymer- 
ase, such as Taq DNA polymerase deoxyribonucleotide 
5' triphosphates or the DNA polymerase of bacteri- 
ophage T7, and substrates. As a result, the 3' end of 

40 each immobilized oligo to which a strand is bound will 
be extended to produce a complementary copy of the 
bound strand. The array is vigorously washed. The wells 
are then filled with a solution containing universal prim- 
er, an appropriate DNA polymerase, and the substrates 

45 and buffer needed to carry out PCR. The array is then 
sealed, isolating the wells from each other, and expo- 
nential amplification is carried out, preferably simultane- 
ously, in each well. 

so 1 .2. Sorting restriction fragments according to their 
terminal sequences, with 3' and 5' terminal priming 
regions being introduced, one before and one after 
strand sorting - 

55 [0104] This procedure consumes larger amounts of 
enzymes and substrates than the procedure described 
in Example 1.1, however, only those strands that are 
correctly bound to the immobilized oligos acquire both 
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priming regions necessary for PCR. The possibility that 
non-specifically bound strands will be amplified is mini- 
mized. Furthermore, different priming regions can be in- 
troduced at different termini of a strand. It then becomes 
possible to: (1 ) perform "asymmetric" PCR, where only 5 
one of the complementary strands is accumulated in sig- 
nificant amounts, and remains single-stranded: (2) in- 
troduce a transcriptional promoter into only one of the 
priming regions, in order to be able to obtain RNA tran- 
scripts of only one strand (without also producing its w 
complement; (3) differentially label complementary 
strands; and (4) avoid self-annealing of the strand's ter- 
minal segments that can interfere with primer hybridiza- 
tion and lower PCR efficiency. 

[0105] In this example, digestion of DNA, adaptor li- 15 
gation and re-digestion of fragments are carried out as 
described in Example 1.1, above. The 3* ends of the re- 
striction fragments, however, are not extended by incu- 
bation with DNA polymerase. Instead, the strands ligat- 
ed at their 5' ends to adaptors are melted apart from 20 
their unextended complements and hybridized to a bi- 
nary array. The array contains immobilized oligos that 
are pre-hybridized with shorter complementary 5'-phos- 
phorylated oligos that cover (mask) the immobilized ol- 
igos except for a segment which includes a variable re- 25 
gion and a region complementary to the portion of the 
restriction site remaining at the fragments' (unrestored) 
3' end. The masked region includes the rest of the re- 
striction site and any other constant sequence, such as 
may be included in a priming region. Hybridization is car- 30 
ried out under conditions that promote the formation of 
only perfectly matched hybrids which are the length of 
the unmasked segment of the immobilized oligo. After 
washing away the unbound strands, the strands that re- 
main bound are ligated to the masking oligos by incu- 35 
bation with DNA ligase. The correctly bound strands 
thus acquire a priming region at their 3' end, in addition 
to the priming region they already have at their 5' end. 
The two priming regions preferably correspond to differ- 
ent primers. The array is then washed under appropri- *o 
ately stringent conditions to remove all nucleic acids ex- 
cept the immobilized oligos and the ligated strands hy- 
bridized to them. 

1.3. Sorting RNAs according to their terminal sequences 45 

[0106] Mature eukaryotic mRNAs share structural 
features that can help in their manipulation using arrays. 
All have a "cap" structure on their 5' end, and most also 50 
possess a 3'-terminal poly(A) tail, which is attached 
posttranscriptionally by a poly(A) polymerase. Because 
there are usually no long oligo(A) tracts in the internal 
regions of cellular RNAs, the poly(A) tail can serve as a 
naturally occurring terminal priming sequence in sorting. 55 
The size of mRNAs (several thousand nucleotides in 
length) allows them to be amplified and analyzed direct- 
ly, without prior cleavage into fragments. 




966 B1 28 

[0107] There are known methods for preparing es- 
sentially undegraded total cellular RNA. Total cellular 
RNA is converted into complementary DNA (cDNA) us- 
ing an oligo(dT) primer and a reverse transcriptase or 
Thermus thermophilus DNA polymerase. Then, omitting 
second strand synthesis, single-stranded cDNAs (which 
possess oligo(dT) extensions at their 5' end and variable 
3' termini) are sorted according to their 3-termini on a 
sectioned binary array and are ligated there to pre-hy- 
bridized adaptors of a predetermined sequence that are 
complementary to the immobilized oligos' constant se- 
quence, and that introduce into a cDNA molecule the 3'- 
terminal priming site. The cDNA is amplified, using two 
primers for PCR: oligo(dT) and an oligo complementary 
to the adaptor. 

2. Preparing partial strands of nucleic acids on 
oligonucleotide arrays -- 

[0108] There are two aspects to this procedure: first, 
the generation of partial strands (partials), and second, 
the sorting of partials according to their terminal oligo 
segments. All of the embodiments described below are 
based on the following principle: in generating partials 
from a strand, one of the original strand ends is pre- 
served (it will be referred to as the "fixed" end), whereas 
the other end is truncated to a different extent in the var- 
ious partials (it will be referred to as the "variable" end). 
Although either the 5' or the 3' end of the original strand 
can serve as the fixed end, it is preferable that the 5* 
end be fixed. If amplification of sorted partials is desir- 
able, it is preferable that the 5' end of the original strand, 
i.e., the fixed end, be provided with a priming region prior 
to partialing by any of the methods described above, and 
that partialing be carried out on a sectioned array. Either 
an individual strand or a mixture of strands can be sub- 
jected to a partialing; however, if the mixture is very com- 
plex (such as a restriction digest of a large genome), it 
is desirable that the mixture first be sorted into less com- 
plex groups of strands, as described above. The groups 
of strands used for preparing partials should essentially 
be devoid of contaminating strands; therefore, sorting 
by terminal sequences is preferable for the preliminary 
sorting. If preliminary sorting is performed, the strands 
will already contain terminal priming regions necessary 
for amplification of the partials. Partialing can be per- 
formed on either DNA or RNA, the final product being 
either DNA or RNA, in either a double-stranded or a sin- 
gle-stranded state. 

2.1. Methods employing enzymatic cleavage of DNA 
fragments ~ 

[0109] The purpose of the cleavage is to produce a 
set of partials of every possible length; therefore, DNA 
should be cleaved as randomly as possible, and to the 
extent that there is approximately one cut per strand. 
Deoxyribonuclease I (DNase I) cleaves both double- 
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stranded and single-stranded DNA; however, double- 
stranded DNA is preferable as the starting material for 
preparing partials because of its essentially homogene- 
ous secondary structure, so that every segment of a 
DNA molecule is equally accessible to cleavage. Dou- 
ble-stranded DNA fragments are produced as a result 
of "symmetric" PCR that can be carried out when sorting 
strands. An advantage of using DNase I is that it pro- 
duces fragments with 5'-phosphoryl and 3'-hydroxyl ter- 
mini, that are suitable for enzymatic ligation. 
[0110] After cleavage of the double-stranded DNA 
fragments, DNase is removed, e.g., by phenol extrac- 
tion. The (partial) strands are then melted apart and are 
hybridized to a sectioned binary array, wherein the im- 
mobilized oligos are pre-hybridized with shorter comple- 
mentary 5'-phosphorylated oligos of a constant se- 
quence that cover (mask) the immobilized oligos except 
for a segment that consists of a variable sequence. Hy- 
bridization is carried out under conditions that favor the 
formation of perfectly matched hybrids of a length that 
is equal to the length of the unmasked (variable) seg- 
ment of the immobilized oligo, and that minimize the for- 
mation of imperfectly matched hybrids. After washing 
away unbound strands, the bound strands are ligated to 
the masking oligos by incubation with a DNA ligase. The 
ligated masking oligos will themselves serve as the sec- 
ond (3-terminal) priming region of a partial strand. (All 
the partials of a strand will share the same 5' priming 
sequence that had been introduced into the strand be- 
fore generation of the partials). If restriction fragments 
are to be partialed that possess some restriction site at 
their termini and do not possess this site internally, it is 
preferable that the 3' terminal priming region added to 
the partials include that site. This increases the specif- 
icity of terminal priming during subsequent amplification 
of the partials by PCR. Subsequent extension, washing, 
and amplification steps are as described in Example 1.1. 
If the partials are prepared for the purpose of sequence 
determination, asymmetric PCR can be performed. Al- 
ternatively, an RNA polymerase promoter sequence can 
be included in one of the two primers, and amplified DNA 
is then transcribed to produce multiple single-stranded 
RNA copies of one of the two complementary partial 
strands. 

2.2. Methods employing chemical degradation of DNA -- 

[0111] These methods are applicable to both double- 
stranded and single-stranded nucleic acids. Chemical 
degradation is, in most cases, essentially random. It can 
be performed under conditions that destroy secondary 
structure, and the small size of the modifying chemicals 
makes the chemicals readily accessible to nucleotides 
in secondary structures. 

[0112] Both base-nonspecific reagents and base- 
specific reagents can be used. In the latter case, after 
base-specific cleavage is performed separately with 
several portions of the sample, the portions are mixed 



together to form a set of all possible partial DNA lengths. 
The main drawback to chemical cleavage is that the lo- 
cation of the terminal phosphate groups on the frag- 
ments is opposite to what is required for enzymatic liga- 
5 tion : 5-hydroxyl and 3-phosphoryl groups are produced 
in most cases. To overcome this problem, enzymatic de- 
phosphorylation of 3' ends can be carried out. 

2.3. Method of preparing partials directly on a sectioned 
10 array, without prior degradation of nucleic acids - 

[0113] In this embodiment, the generation of partials 
and their sorting according to the identity of the se- 
quences at their variable ends occur essentially in one 

15 step. First, a strand or a group of strands (if double- 
stranded nucleic acid is used as a starting material, the 
complementary strands are first melted apart), is directly 
hybridized to a sectioned ordinary array, whose oligos 
only comprise variable sequences of a pre-selected 

20 length, and that are immobilized by their 5' termini. Op- 
timally, hybridization is carried out under conditions in 
which hybrids can only form whose length is equal to 
the length of the immobilized oligo. If the array is com- 
prehensive, then a hybrid is formed somewhere within 

25 the array for every oligo that occurs in a DNA's se- 
quence. After hybridization, the entire array is washed 
and incubated with an appropriate DNA polymerase in 
order to extend the immobilized oligo, using the hybrid- 
ized strand as a template. Each product strand is a par- 

30 tial (complementary) copy of the hybridized strand. 
Each partial begins at the place in the strand's sequence 
where it has been bound to the immobilized oligo and 
ends at the priming region at the 5' terminus of the 
strand. If a priming region has not been introduced at 

35 the strand's 5' end before partialing, it can be generated 
at this step, after the hybrids that have not been extend- 
ed, are eliminated by washing. This can be done either 
by ligating the 5' end of the bound strand to a single- 
stranded oligoribonucleotide adaptor, or by tailing the 

40 immobilized partial copy with a homopolynucleotide. 
The entire array is vigorously washed under conditions 
that remove the original full-length strands and essen- 
tially all other material not covalently bound. Subse- 
quent amplification of the immobilized partials can be 

45 carried out in different ways, dependent on whether it is 
desired to use linear or exponential amplification. 
[0114] Exponential copying results in the generation 
of partials and their complements. For a strand to be 
exponentially amplified by PCR, both of its termini 

50 should be provided with a priming region, preferably dif- 
ferent priming regions. The immobilized (complementa- 
ry) partial contains only one (3'-terminal) priming region, 
and a complementary copy produced by linear copying 
would also have only one priming region (on its 5' end). 

55 For RNA copies to have a priming region at their 5' ends, 
the immobilized partial should have been provided with 
an RNA polymerase promoter downstream of its 3' ter- 
minal priming region using the methods described here- 
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in. The second priming region that is needed for expo- 
nential amplification can be introduced at the 3' ends of 
the complementary copies as follows. 

(a) The 3' termini of RNA copies can then be ligated 
to oligoribonucleotide or oligodeoxyribonucleotide 
adaptors which are phosphorylated at their 5' end 
and whose 3' end is blocked. Exponential PCR can 
be performed by utilizing the two primers that cor- 
respond to the two priming regions, and then incu- 
bating with Tth DNA polymerase. 

(b) If the amplified copies are DNA, they can be 
transferred, such as by blotting, (after melting them 
free of the immobilized partial) onto a binary array 
that is a mirror copy of the first array in the arrange- 
ment of the variable segments of its immobilized ol- 
igos. The constant segments of this binary array are 
pre-hybridized to masking oligos whose ligation to 
the 3' termini of the transferred DNAs (by DNA 
ligase) results in generation of the second priming 
region to permit exponential PCR. 

In methods (a) and (b), both priming regions 
preferably contain, when applicable, the recognition 
sequence of the restriction endonuclease that was 
used to digest the genomic DNA before full-length 
strand sorting, and which had thus been substan- 
tially eliminated from the strands' internal regions. 

(c) If partials are surveyed only for oligos that occur 
in one complementary strand (such as detecting on- 
ly parental oligos), either only one of the two differ- 
ent primers should be labeled, or the primers should 
be labeled differently. It is also possible to use la- 
beled substrates during asymmetric PCR. 

3. Surveying oligonucleotides with binary arrays ~ 

[0115] Surveying oligo content can be carried out in 
the different embodiments of the invention by hybridiza- 
tion of strands (or partials) to an ordinary array, followed 
by detection of those hybridized. However, the signal- 
to-noise ratio is not high enough to always avoid ambig- 
uous results. The most significant problem is inability to 
sufficiently discriminate against mismatched basepairs 
that occur at the ends of hybrids. That hampers analysis 
of complex sequences. The use of binary arrays helps 
to overcome this problem. 

[0116] Binary arrays are also useful for surveying 
longer oligos than are easily surveyed on an ordinary 
array (e.g., signature oligos) without increasing the size 
over that of an ordinary array. 

[0117] Immobilized oligos in a binary survey array can 
have either free 5' or 3' ends, and the constant segment 
can be either upstream or downstream. In most cases, 
it is preferable that the 3' ends of immobilized oligos be 
free, and that their constant segments be upstream. 
[0118] Surveying can utilize sectioned arrays. How- 
ever, the use of plain arrays is preferable because they 
are less expensive and more amenable to miniaturiza- 



tion. The following methods are based on the use of 
plain binary arrays and involve fragmentation of the 
strands or partials prior to surveying. 

5 3.1 . Comprehensive surveys of DNA strands - 

[0119] Every oligo present in a strand or in a partial, 
or in a group of strands or partials, is surveyed. If a sur- 
vey of partials is performed in order to establish nucle- 
otide sequences, it is preferable that each partial be rep- 
resented by the same sense copies. Thus, there should 
be only one of the complementary strands in a sample 
or the complementary strands should be differentiate, 
e.g., one strand should produce either no detectable sig- 
nal or a weaker signal. This can be accomplished by 
amplifying the partials linearly or by the use of asymmet- 
ric PCR. 

[0120] DNA strands (or partials) to be surveyed are 
preferably digested with nuclease SI under conditions 
that destabilize DNA secondary structure. The digestion 
conditions are chosen so that the DNA pieces produced 
are as short as possible, but at the same time, most are 
at least one nucleotide longer than the variable segment 
of the oligos immobilized on the binary array. If the sur- 
veyed strands or partials have been previously sorted 
and amplified on a sectioned array, this degradation pro- 
cedure can be performed simultaneously in each well of 
that array. Alternatively, if it is desired to store that array 
as a master for later use, the array can be replicated by 
blotting onto another sectioned array. The DNA is then 
amplified within the replica array by (asymmetric) PCR 
prior to digestion with nuclease S1 . 
[0121] After digestion, the nuclease is inactivated by, 
for example, heating to 100°C, and the DNA pieces are 
hybridized to an array whose immobilized oligos' con- 
stant segments are pre-hybridized to 5-phosphorylated 
complementary masking oligos. Preferably, the con- 
stant segment contains a restriction site that has been 
eliminated from the internal regions of the strands prior 
to sorting and is long enough so that its hybrid with the 
masking oligo is preserved during subsequent proce- 
dures. 

[01 22] The array is incubated with DNA ligase to ligate 
the masking oligos to only those hybridized DNA strands 
(or partials) whose 3' terminal nucleotide is immediately 
adjacent to the 5' end of the masking oligo, and matches 
its counterpart in the immobilized oligo. DNA ligase is 
especially sensitive to mismatches at the junction site. 
[0123] After all non-ligated DNA pieces have been 
washed away under much more stringent conditions 
that were used during hybridization, the immobilized ol- 
igos are extended by incubation with a DNA polymer- 
ase, preferably by only one nucleotide, using the pro- 
truding part of the ligated DNA piece as a template, and 
preferably using the chain-terminating 2',3'-dideoxynu- 
cleotides as substrates. Extension is only possible, if the 
3'-terminal base of the immobilized oligo forms a perfect 
basepair with its counterpart in the hybridized DNA 
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piece. The use of the dideoxy-nucleotides ensures that 
all hybrids are extended by exactly one nucleotide and 
that ail are of the same length. The array is then washed 
under conditions sufficiently stringent to remove unex- 
tended hybrids. 

3.2. Detection of hybrids - 

[01 24] Hybrids can be detected by a number of differ- 
ent means. Unlabeled hybrids can be detected by using 
surface plasmon resonance techniques, which currently 
can detect 10 8 to 10 9 hybrid molecules per square mil- 
limeter. Alternatively, hybrids can be conventionally la- 
beled, such as with radioactive or fluorescent groups. 
Fluorescent labels are convenient. 
[0125] To ensure the lowest level of background labe- 
ling, it is preferable to label hybrids in a manner such 
that its detection is dependent on the success of both a 
ligation and an extension step. This can be accom- 
plished within the scheme of oligo surveying by labeling 
the masking oligos, and the 2',3 , -dideoxynucleotides 
used for the extension with fluorescent dyes possessing 
different emission spectra. The array can then be 
scanned at different wavelengths, corresponding to the 
emission maxima of the two dyes, and only signals from 
those areas that emit fluorescence of both colors are 
taken as a positive result. 

[0126] After hybrids are extended (concomitant with 
labeling) and edited, the array is thoroughly washed to 
remove unincorporated label, destroy unextended hy- 
brids, and discriminate one more time against mis- 
matched hybrids that might have remained. A preferred 
method is to wash the array at steadily increasing tem- 
perature, with the signal from each area being read at 
a pre-determined time, when the conditions ensure the 
highest selectivity for the particular hybrid that forms in 
that area. Other conditions (such as denaturant and/or 
salt concentration) can also be controlled overtime. The 
fluorescence pattern can be recorded at predetermined 
time intervals with a scanning microfluorometer, such as 
an epifluorescence microscope. 

4. Determination of the nucleotide sequences of strands 
in a mixture when each strand possesses at least one 
oligo that does not occur in any other strand in the 
mixture - 

[0127] Figures 8 to 11 depict the determination of the 
sequences of two mixed strands using the methods of 
the invention. The example demonstrates the power of 
the invention to identify all the oligos present in a strand 
(i.e., its strand set) when it possesses at least one oligo 
that does not occur in any other strand in the mixture. 
In particular, the example demonstrates: (a) how the da- 
ta obtained by surveying the partial strands generated 
from a mixture of strands and sorted by their variable 
termini (i.e., the upstream subset of each address) and 
the inferred downstream subset of each address (which 



together form the indexed address sets) are used to 
construct the unindexed address sets; and (b) how the 
unindexed address sets are compared to each other to 
identify prime sets. The example also demonstrates 
how the oligos contained in a strand set are assembled 
into the sequence of the strand, even though the primary 
data is obtained from a mixture. In particular, the exam- 
ple demonstrates: (a) how oligos in a strand set are as- 
sembled into sequence blocks; (b) how the contents of 
the indexed address sets are filtered so that only infor- 
mation pertaining to the oligos in a particular strand set 
remains; (c) how this filtered data is re-expressed in 
terms of the sequence blocks that are contained in that 
particular strand; (d) how information in the resulting 
"block sets" is used to identify those blocks that definite- 
ly occur only once in the strand ("unique blocks") and to 
identify those that can potentially occur more than once; 
(e) how information in block sets of unique blocks is 
used to determine the relative order of the blocks that 
occur only once in the strand; (f) how the information in 
the block sets limits the positions at which the other 
blocks can occur (relative to other blocks); and (g) how 
a consideration of the sequences at the ends of blocks, 
in combination with a consideration of the relative posi- 
tions of the blocks, leads to the unambiguous determi- 
nation of the complete sequence of the strand. This ex- 
ample also illustrates: (a) how oligos that occur more 
than once in a strand are identified and located within 
the sequence, even though the survey data contain no 
information as to the number of times a particular oligo 
occurs in a partial or a mixture of partials having the 
same terminal oligo; and (b) how the sequences of dif- 
ferent strands in a mixture can be determined separate- 
ly, despite the fact that many of the oligos occur in more 
than one strand. 

[0128] Figure 8a shows the sequences of two short 
strands (parental strands) that are assumed to be 
present in a mixture (with no other strands). It is as- 
sumed that complete sets of partials have been gener- 
ated from this mixture, and that each set of partials has 
been separately surveyed, with the partials sharing the 
same address oligo being surveyed together. For the 
purpose of illustrating the method of analyzing the data, 
it is assumed that the address oligos and the surveyed 
oligos are three nucleotides in length. In practice, longer 
oligos should be used. However, for illustration it is eas- 
ier to comprehend an example based on trinucleotides. 
The same methods of analyzing the data apply when 
longer oligos are surveyed, when much longer strands 
are in the mixture, and when the mixture contains many 
more strands. 

[0129] Figure 8b shows the upstream subsets deter- 
mined by surveying and the downstream subsets in- 
ferred (i.e., Figure 8b shows indexed address sets). The 
address oligos (bold letters) are listed vertically in the 
center of the diagram. The oligos listed horizontally to 
the left of each address oligo are those oligos that were 
detected in a survey of the partials at that address (the 
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upstream subset). The oligos listed horizontally to the 
right of each address oligo are those inferred from the 
upstream subsets to occur downstream of that address 
oligo (the downstream subset). For example, oligo 
"ACC" is contained in the upstream subset of the ad- 5 
dress oligo "CCT". This means that oligo "CCT" occurs 
downstream of oligo "ACC" in at least one strand in the 
mixture. Therefore "CCT" is inferred to be in the down- 
stream subset of address set "ACC". The remaining 
downstream oligos in all of the address sets are similarly 
inferred. Note that an address oligo is a member of its 
own upstream and downstream subsets. 
[01 30] After the indexed address sets of all addresses 
in the parental strands have been determined (as shown 
in Figure 8b), the information is organized into unin- 
dexed address sets (Figure 8c), having no division into 
downstream and upstream subsets, but merely listing, 
for each address oligo, those oligos that occur in either 
the upstream or downstream subset (or in both). In Fig- 
ure 8c, the address oligos (bold letters) are listed verti- 
cally on the left side of the diagram. Note that the ad- 
dress oligo is a member of its own unindexed address 
set. 

[01 31] Unindexed address sets are grouped together 
according to the identity of the oligos they contain (Fig- 
ure 8d). Unindexed address sets that contain an identi- 
cal set of oligos are grouped together. It can be seen 
that three groups of address sets are formed in this ex- 
ample. The groups are identified by the Roman numer- 
als (I, II, and III). The address oligos of each group (for 
example, CTA, GTC, and TCC in group II) always occur 
together in a strand and can occur together in more than 
one strand. 

[0132] Each group of identical address sets is then 
compared to all other groups of identical address sets 
to see if its common address set appears to be a prime 
by seeing whether any other address set is a subset of 
it. For example, in Figure 8d, the address set common 
to group 111 is not a prime address set, because the ad- 
dress set common to group I is a subset of the address 
set common to group III. However, the address set com- 
mon to group I and the address set common to group II 
appear to be prime address sets. 
[01 33] Each putative prime address set is then tested 
to see if it is a strand set by examining all the address 
sets that contain all of the oligos that are present in it. 
For example, in Figure 9a, all the address sets that con- 
tain all the oligos present in the putative prime address 
set common to group I are listed together (namely the 
address sets contained in groups I and III). The address 
oligos are shown in bold letters on the left side of the 
diagram, and the groups are identified by Roman nu- 
merals. The address set common to group I is indeed a 
prime address set (and therefore it contains a single 
strand set) because a list of the eleven oligos that are 
found in every address set in the diagram (they are seen 
as full columns) is identical to the list of eleven address- 
es on the left side of the diagram. Similarly, Figure 8b 



shows why the address set common to group II is also 
a prime set. The twelve oligos common to every address 
set in the diagram are all found in the list of twelve ad- 
dresses on the left side of the diagram. Had either of 
these putative prime address sets not turned out to be 
a prime set (by the criterion described above), then it 
would have been identified as a pseudo-prime address 
set, and further analysis would have been required to 
decompose it into its constituent strand sets. 
[0134] Once the strand sets in a mixture have been 
identified, the oligos in each strand set can be assem- 
bled into the strand sequence in a series of steps, as 
illustrated in Figure 10 (which utilizes the strand set de- 
termined in Figure 9a). 

[01 35] First the oligos in the strand set are assembled 
into sequence blocks. A sequence block contains one 
or more uniquely overlapping oligos. Two oligos of 
length n, uniquely overlap each other if they share an 
identical sub-sequence that is r?-1 nucleotides long and 
no other oligos in the same strand set share that sub- 
sequence. For example, for the strand set shown in Fig- 
ure 10a, the oligos "CAT" and "ATG" share the sub-se- 
quence "AT" which does not occur in other oligos. These 
two oligos therefore uniquely overlap to form the se- 
quence block "CATG", as shown in Figure 1 0b. Similarly, 
oligo "TGG" uniquely overlaps oligo "GGT" by the com- 
mon sub-sequence "GG", and oligo "GGT" also uniquely 
overlaps (on its other end) oligo "GTA" by the common 
sub-sequence "GT M . Thus, the three oligos ("TGG", 
"GGT", and "GTA") can be maximally overlapped to form 
sequence block "TGGTA". In forming sequence blocks, 
the following rule is adhered to: two oligos can be in- 
cluded in the same block if they are the only oligos in 
the strand set to possess their common sub-sequence. 
Thus, "ATG" does not uniquely overlap "TGG", because 
the strand set contains a third oligo, "TTG", that shares 
the common sub-sequence "TG". If, following these 
rules, an oligo does not uniquely overlap any other oligo, 
then a sequence block consists of only that oligo. For 
example, "TAA" forms its own block. Following the 
above rules, the eleven oligos that occur in strand set A 
can be assembled into four sequence blocks. 
[0136] Second, the data contained in the indexed ad- 
dress sets shown in Figure 8b are filtered to remove ex- 
traneous information that does not pertain to strand set 
A. Figure 10c shows the resulting filtered address sets. 
All address sets whose address oligo is not one of the 
oligos in strand set A are eliminated. In addition, all oli- 
gos that are not members of strand set A are removed 
from the upstream and downstream subsets of the re- 
maining address sets. The resulting filtered address 
sets are then grouped together according to the oligos 
that are contained in each block. For example, the fil- 
tered address sets for address oligos "CAT" and "ATG" 
have been grouped together in Figure 10c because 
these two oligos are contained in sequence block 
"CATG". In Figure 10c, the address oligos found in the 
same block are identified by rectangular boxes. In addi- 
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tion, oligos that occur in the same block are grouped 
together within each upstream and downstream subset. 
[0137] Third, the filtered address sets are converted 
into block sets, as shown in Figure 10d. In a block set, 
the information from different address sets is combined. 5 
Instead of a different horizontal line for each filtered ad- 
dress set that pertains to a particular block, the informa- 
tion in all of the address sets that pertain to that partic- 
ular block is combined into a single horizontal line. For 
example, in Figure 9c, five different filtered address sets 
pertain to sequence block "TACCTTG". In Figure 10d, 
these five lines are combined into a single line in which 
the address oligos are replaced by an "address block", 
shown as "TACCTTG" surrounded by a bold box. Simi- 
larly, the upstream oligos are replaced by upstream 
blocks, and the downstream oligos are replaced by 
downstream blocks. In substituting sequence blocks for 
the upstream (or downstream) oligos that are contained 
in the filtered address sets for a given address block, 
the following rule is adhered to: a sequence block only 
occurs in the upstream subset (or in the downstream 
subset) of an address block, if every oligo that is con- 
tained in that address block occurs in the upstream (or 
in the downstream) subset of every filtered address set 
that pertains to that address block. For example, se- 
quence block "CATG" occurs in the upstream subset of 
address block "TACCTTG" because oligos "CAT" and 
"ATG" occur in the upstream subset of address oligos 
"TAC", "ACC", "CCT", "CTT", and "TTG". 
[0138] Often, a sequence block does not occur in its 
own upstream or downstream subset. For example, se- 
quence block "CATG" does not occur in the upstream 
or downstream subset of its own block set (i.e., in block 
set "CATG"), because oligo "ATG" is not present in the 
upstream subset of address set "CAT" and oligo "CAT" 
is not present in the downstream subset of address set 
"ATG". When a sequence block does not occur in its own 
upstream or downstream subset, this indicates that that 
sequence block occurs only once in the nucleotide se- 
quence of that strand. However, a sequence block may 
occur in both the upstream subset and in the down- 
stream subset of its own block set. For example, se- 
quence block "TGGTA" occurs in both the upstream 
subset and in the downstream subset of block set "TGG- 
TA". When a sequence block does occur in its own up- 
stream and downstream subsets, it indicates that the se- 
quence block may, but not must, occur more than once 
in the sequence. The presence of more than one paren- 
tal strand in the original mixture can introduce additional 
oligos into the filtered upstream and downstream sub- 
sets that can cause a block that actually occurs only 
once in a sequence to appear in both the upstream and 
downstream subsets of its own block set. However, fur- 
ther analysis of the data determines the multiplicity of 
each block in the strand (as described below), thus re- 
solving these uncertainties. For convenience, block sets 
that pertain to blocks that definitely occur only once in 
the sequence are listed together. For example, in Figure 



10d, block set "CATG" and block set "TACCTTG" are 
listed together. 

[0139] Fourth, the position of each sequence block 
relative to the other sequence blocks is determined. An 
examination of the block sets that pertain to unique 
blocks (that definitely occur only once in the sequence 
of the strand) indicates their relative positions. For ex- 
ample, in Figure 10d, block set "CATG" indicates that 
unique sequence block "TACCTTG" occurs down- 
stream of unique sequence block "CATG". This is con- 
firmed by block set "TACCTTG", in which unique se- 
quence block "CATG" occurs upstream of unique se- 
quence block "TACCTTG". The relative position of the 
two unique sequence blocks is indicated in Figure 10e, 
where the top line to the left of the arrow shows "CATG" 
upstream (to the left) of "TACCTTG". The relative posi- 
tion of the sequence blocks that can potentially occur 
more than once in the nucleotide sequence of the strand 
is determined from their presence or absence in the up- 
stream and downstream subsets of other sequence 
blocks. For example, sequence block "TAA" occurs in 
the downstream subset of block set "CATG" (and does 
not occur in the upstream subset of block set "CATG"). 
Furthermore, sequence block "TAA" also occurs in the 
downstream subset of block set "TACCTTG" (and not in 
its upstream subset). Therefore, sequence block "TAA" 
must occur downstream of both unique sequence 
blocks "CATG" and "TACCTTG". This is indicated in Fig- 
ure 10e, where the bottom line to the left of the arrow 
shows "TAA" as occurring downstream of "CATG" and 
"TACCTTG". Furthermore, sequence block "TGGTA" 
occurs only in the downstream subset of block set 
"CATG". Therefore, it must occur downstream of 
"CATG" in the sequence. On the other hand, sequence 
block "TGGTA" occurs in both the upstream and down- 
stream subsets of block set "TACCTTG". This indicates 
that "TGGTA" can potentially occur in the sequence at 
positions both upstream and downstream of unique se- 
quence block "TACCTTG". Finally, "TGGTA" only oc- 
curs upstream of "TAA". This is indicated in Figure 10e ( 
where the bottom line to the left of the arrow contains a 
bracket that shows the range of positions at which 
"TGGTA" can occur, relative to the positions of the other 
sequence blocks. At this point in the analysis, the dia- 
gram to the left of the arrow in Figure 9c contains all the 
information obtained that pertains to strand set A. 
[0140] Finally, the sequence of the strand is ascer- 
tained by taking into account both the relative position 
of the sequence blocks, as shown in the diagram to the 
left of the arrow in Figure 10e, and the identity of the 
sequences at the ends of the sequence blocks. The ob- 
ject of this last step is to assemble the blocks into the 
final sequence. Four rules are followed: (a) each of the 
blocks must be used at least once; (b) the blocks must 
be assembled into a single sequence; (c) the ends of 
blocks that are to be joined must maximally overlap each 
other (i.e., if the surveyed oligos are n nucleotides in 
length, then two blocks maximally overlap each other if 
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they share a terminal sub-sequence that is n-1 nucle- 
otides in length); and (d) the order of the blocks must be 
consistent with their positions relative to one another, 
as ascertained from the block sets. For example, in Fig- 
ure 10e, "CATG" is upstream of TACCTTG". "CATG" 
cannot be joined directly to "TACCTTG", since these two 
sequence blocks do not possess maximally overlapping 
terminal sequences (two nucleotides in length). Howev- 
er, an examination of the permissible positions at which 
other sequence blocks can occur indicates that "TGG- 
TA" can occur in the gap between "CATG" and "TAC- 
CTTG". The ends of these sequence blocks are then 
examined to see whether the gap can be bridged. 
"CATG" can be joined to "TGGTA" by maximally over- 
lapping their shared terminal sub-sequence "TG". Fur- 
thermore "TGGTA" can be joined to "TACCTTG" by 
maximally overlapping their shared terminal sub-se- 
quence "TA". Similarly, the gap that occurs downstream 
of "TACCTTG" can potentially be filled by both "TAA" 
and "TGGTA". "TAA" must be used, because it was not 
used at any other location. However, "TACCTTG" can- 
not be directly joined to "TAA". The solution is to join 
"TACCTTG" to "TGGTA", and then to join "TGGTA" to 
"TAA". Thus, the sequence of strand A (which is shown 
in Figure 100 is unambiguously assembled by utilizing 
sequence block "TGGTA" twice (as summarized in the 
diagram to the right of the arrow in Figure 10e). 
[0141] The same procedure is followed to determine 
the sequence of strand B (see Figure 11). In this exam- 
ple, there are three sequence blocks that do not occur 
in their own upstream or downstream subsets, and they 
therefore definitely occur only once in the sequence of 
strand B (namely, sequence blocks "CTTG", "GTCC", 
and "TACC"). An examination of block set "GTCC" 
shows that "GTCC" occurs upstream of "CTTG" and 
"TACC". However, an examination of block set "CTTG" 
and an examination of block set "TACC" indicates that 
sequence blocks "CTTG" and "TACC" can both occur 
upstream and downstream of each other, which appears 
to conflict with the observation that these sequence 
blocks only occur once in the sequence of strand B. 
There is actually no conflict. Each of these sequence 
blocks does indeed occur only once. It is just that their 
positions, relative to one another, in strand B are ob- 
scured by the presence of conflicting information from 
the relative positions of oligos that occur in strand A. 
This ambiguity (indicated by the identical positions of 
sequence blocks "CTTG" and "TACC" in the diagram to 
the left of the arrow in Figure 11e) is resolved by the 
remainder of the information. The positions of those se- 
quence blocks that can potentially occur more than once 
in the sequence of strand B is determined from other 
block sets. First, the block sets of the sequence blocks 
that definitely occur only once in the sequence (namely, 
block sets "CTTG", "GTCC", and "TACC") are consult- 
ed. The range of positions at which these other se- 
quence blocks can occur (relative to the positions of oth- 
er blocks) is indicated in the diagram to the left side of 



the arrow in Figure 11e. 

[0142] The assembly of the nucleotide sequence of 
Strand B proceeds as follows: "ATG" is upstream of all 
other blocks. The uniquely occurring block immediately 

5 downstream of "ATG" is "GTCC". "ATG" and "GTCC" 
cannot be directly joined. However, "ATG" can be direct- 
ly joined to "TGGT", so the correct order is to join "ATG" 
to "TGGC", and then to join "TGGC" to "GTCC". Neither 
"CTTG" nor "TACC" can be directly joined to "GTCC". 

10 Three different sequence blocks can be used to bridge 
this gap (namely, "CCT", "GTA", and "TGGT"). The only 
combination of these three sequence blocks that can fill 
this gap is "CCT" alone, which bridges the gap between 
"GTCC" and "CTTG". This resolves the ambiguity as to 

15 the relative positions of "CTTG" and "TACC". "CTTG" is 
therefore upstream of "TACC". "CTTG" cannot be direct- 
ly joined to "TACC". Again, there are three different se- 
quence blocks that can be used to fill this gap (namely, 
"CCT", "GTA", and "TGGT"). The only combination of 

20 these three sequence blocks that can fill this gap is 
"TGGT" and "GTA" (i.e.,"GTTG" is joined to "TGGT", 
"TGGT" is joined to "GTA", and "GTA" is joined to 
"TACC"). And finally, "CTA", which occurs upstream of 
all other blocks, must be included in the sequence. How- 

25 ever, "TACC" cannot be directly joined to "CTA". There 
are three different sequence blocks that can be used to 
fill this gap (namely, "CCT", "GTA", and "TGGT"). The 
only combination of these three sequence blocks that 
can fill this gap is "CCT" alone. Thus, the assembly of 

30 the sequence of Strand B from its sequence blocks is 
completed. Note that some sequence blocks that could 
potentially occur in the sequence more than once, ac- 
tually occur only once (e.g., "GTA"), while others actu- 
ally occur more than once (e.g., "CCT"). 

35 [0143] Using the methods of this invention, the entire 
sequence of strand B is unambiguously determined, de- 
spite the fact that some oligos occur more than once in 
its sequence, despite the fact that more than one se- 
quence block can be assembled from the oligos that oc- 

40 cur in the strand, despite the fact that the multiplicity of 
occurrence of each oligo is not determined during sur- 
veying, despite the fact that the strand is analyzed in a 
mixture of strands, and despite the fact that the other 
strand in the mixture possesses many of the same oli- 

45 gos. 



Claims 

50 1. A sectioned binary oligonucleotide array compris- 
ing an array of predetermined areas on a surface of 
a solid support, wherein said areas are physically 
separated from one another into sections, such that 
nucleic acids in an aqueous solution generated in 

55 one section cannot migrate to another section, each 
area having therein, covalently linked to said sur- 
face, multiple copies of a binary oligonucleotide of 
a predetermined sequence, said binary oligonucle- 
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otide consisting of a constant sequence of base- 
pairing nucleotides adjacent to a variable sequence 
of base-pairing nucleotides and wherein the con- 
stant sequence is the same for all oligonucleotides 
in the array. 

2. An array according to claim 1 , wherein one or more 
of the nucleotides in said binary oligonucleotides is 
a modified nucleotide. 

3. An array according to claim 1 or claim 2 that con- 
tains all possible variable sequences of a given 
length from three to light nucleotides. 

4. An array according to any of the preceding claims 
wherein the binary oligonucleotides in each area 
have variable sequences of the same length. 

5. An array according to any of the preceding claims 
wherein the binary oligonucleotides have free 3' ter- 
mini and wherein the binary oligonucleotides have 
their constant sequences adjacent to the 5' end of 
their variable sequences. 

6. A sectioned binary oligonucleotide array according 
to any preceding claim, wherein said constant nu- 
cleotide sequence comprises one or more function- 
al sequences selected from a nucleic acid polymer- 
ase priming region, an RNA polymerase promoter 
region and a restriction endonuclease recognition 
site. 

7. A sectioned oligonucleotide array according to any 
preceding claim, wherein said sections are physi- 
cally separated by a lattice attached to said surface, 
by a lattice removably attachable to said surface, 
by wells in said solid support, or by a gel which phys- 
ically separates said areas by preventing nucleic 
acids in an aqueous solution placed in one area 
from migrating to another area. 

8. A sectioned oligonucleotide array according to any 
preceding claim, further comprising a cover remov- 
ably attachable to said solid support. 

9. An array according to claim 8, wherein said cover 
comprises a material onto which nucleic acid 
strands can be blotted. 

1 0. A method of sorting a mixture of nucleic acid strands 
comprising the steps of: 

a) providing a solution containing a mixture of 
nucleic acid strands in single-stranded form, 

b) providing a first binary oligonucleotide array 
of predetermined areas on a surface of a solid 
support, each area having therein, covalently 
linked to said surface, copies of a binary oligo- 



nucleotide, consisting of a constant sequence 
of base-pairing nucleotides adjacent to a vari- 
able sequence of base-pairing nucleotides, the 
constant nucleotide sequence being the same 
5 for all oligonucleotides in the array, 

c) contacting said solution to said first binary 
oligonucleotide array, and 

d) hybridizing said nucleic acid strands to bina- 
ry oligonucleotides in said array under condi- 

10 tions sufficiently stringent to promote hybrids of 

the length of the immobilized oligonucleotides 
but not shorter hybrids. 

11. A method according to claim 10, wherein said first 
15 binary oligonucleotide array is a sectioned array ac- 
cording to any one of claims 1-9. 

12. A method according to claim 10 or claim 11, wherein 
said nucleic acid strands have a common terminal 

20 restriction site that is complementary to said con- 
stant sequence. 

13. A method for sorting terminally truncated partial 
copies of at least one nucleic acid strand by their 

25 variable termini utilizing an array of immobilized bi- 
nary oligonucleotides having a constant sequence 
of at least three nucleotides adjacent to a variable 
sequence of at least three nucleotides, comprising 

30 a) hybridizing to said immobilized oligonucle- 

otides a masking oligonucleotide that is com- 
plementary either to said constant sequence or 
to a portion thereof that is adjacent to said var- 
iable sequence; 

35 b) hybridizing the partial copies to the array un- 

der conditions that promote formation of hy- 
brids of the length of said variable sequence but 
not shorter lengths; 

c) ligating the masking oligonucleotides to par- 
40 tial copies that have hybridized to said variable 

sequence by their variable termini; and 

d) increasing the stringency of the hybridization 
conditions to remove hybrids shorter than the 
combined length of the masking oligonucle- 

45 otide and the variable sequence. 

14. A method for sequencing the oligonucleotide con- 
tent of a nucleic acid strand utilizing a comprehen- 
sive array of binary immobilized oligonucleotides, 

50 containing all possible variable sequences of a giv- 
en length from three to eight nucleotides, compris- 
ing 

a) preparing a complete set of terminally trun- 
55 cated copies of said strand; 

b) terminally sorting said copies into groups 
having common variable ends utilizing a binary 
array by the method of claim 13; 
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c) surveying the oligonucleotide content of 
each group by hybridizing the same to said 
comprehensive array; and 

d) detecting where hybridization to the array 
has occurred. 

15. A method for obtaining information to allocate se- 
quenced and ordered fragments from sister chro- 
mosomes to chromosomal linkage groups, com- 
prising 

a) preparing a restriction digest different from 
any digest used to sequence and order said 
fragments, thereby producing fragments that 
span the junctions between said ordered frag- 
ments; 

b) terminally sorting said fragments utilizing a 
binary oligonucleotide array according to the 
method of claim 12; 

c) preparing terminally truncated partial copies 
of the sorted fragments in individual wells of 
said binary oligonucleotide array by a method 
comprising: 

(i) hybridizing the strand to the array by an 
oligonucleotide segment contained in the 
strand, said array comprising predeter- 
mined areas on a surface of a solid sup- 
port, each area having therein immobilized 
oligonucleotides consisting of a predeter- 
mined variable sequence, said hybridiza- 
tion taking place under conditions that pro- 
mote the formation of hybrids of the length 
of the immobilized oligonucleotide in each 
area but not shorter hybrids, and 

(ii) where the strand is hybridized to a 3' 
array, enzymatically extending the immobi- 
lized oligonucleotide using the hybridized 
strand as a template, and where the strand 
is hybridized to a 5' array, hybridizing a 
primer to a priming region contained in the 
3' terminus of the hybridized strand, then 
enzymatically extending the primer to form 
an extension product and ligating the ex- 
tension product to the immbolized oligonu- 
cleotide; 

d) hybridizing said partial copies to an array of 
all variable nucleotides of a given length; and 

e) detecting where hybridization has occurred 
on the latter array. 

1 6. A method for surveying oligonucleotides in a nucleic 
acid strand comprising 

a) randomly degrading the strand into pieces 
that are as short as possible but whose average 
length exceeds by at least one nucleotide the 



10 



15 



20 



25 



30 



35 



40 



length of oligonucleotides to be surveyed by hy- 
bridization to variable sequences of binary oli- 
gonucleotides; 

b) ligating the pieces to a ligating oligonucle- 
otide complementary to at least a portion of a 
constant sequence of immobilized oligonucle- 
otides in a binary array according to claim 1 ; 

c) hybridizing the pieces co the binary array, 
said binary array having immobilized oligonu- 
cleotides in an ordered array therein and con- 
sisting of a constant sequence adjacent to a 
variable sequence, the immobilized oligonucle- 
otides in an individual area of the array having 
the same sequence; and 

d) detecting the hybrids formed. 



Patentanspruche 

1. Sektionierte binare Oligonucleotidanordnung um- 
fassend eine Anordnung vorbestimmter Bereiche 
auf einer Oberflache eines festen Tragers, wobei 
die Bereiche korperlich voneinander in Abschnitte 
derart getrennt sind, dass Nucleinsauren in einer in 
einem Abschnitt gebildeten wassrigen Lbsung nicht 
in einen anderen Abschnitt wandern konnen, wobei 
jeder Bereich darin kovalent an die Oberflache ge- 
bundene mehrfache Kopien eines binaren Oligonu- 
cleotids einer vorbestimmten Sequenz aufweist, 
wobei das binare Oligonucleotid aus einer konstan- 
ten Sequenz von Basenpaarungsnudeotiden be- 
steht, die an eine variable Sequenz von Basenpaa- 
rungsnudeotiden anliegen, und wobei die konstan- 
te Sequenz fur alle Oligonucleotide in der Anord- 
nung die gleiche ist. 

2. Anordnung nach Anspruch 1 , wobei ein Oder meh- 
rere der Nucleotide in den binaren Oligonucleotiden 
ein modifiziertes Nucleotid ist. 

3. Anordnung nach Anspruch 1 Oder Anspruch 2, die 
alle moglichen variablen Sequenzen einer vorgege- 
benen Lange von drei bis acht Nucleotiden enthalt. 
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Anordnung nach einem der vorhergehenden An- 
spruche, wobei die binaren Oligonucleotide in je- 
dem Bereich variable Sequenzen derselben Lange 
aufweisen. 

Anordnung nach einem der vorhergehenden An- 
spruche, wobei die binaren Oligonucleotide freie 3'- 
Termini aufweisen und wobei die binaren Oligonu- 
cleotide ihre konstanten Sequenzen am 5-Ende ih- 
rer variablen Sequenzen anliegend aufweisen. 

Sektionierte binare Oligonucleotidanordnung nach 
einem der vorhergehenden Anspruche, wobei die 
konstante Nucleotidsequenz eine oder mehrere 
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funktionelle Sequenzen umfasst, ausgewahlt unter 
einer Nucleinsaure-Polymeraseprimierregion, ei- 
ner RNA-Polymerase-Promotorregion und einer 
Restriktionsendonuclease-Erkennungsstelle. 

7. Sektionierte Oligonucleotidanordnung nach einem 
der vorhergehenden Anspruche, wobei die Ab- 
schnitte korperiich durch ein an die Oberflache an- 
gebrachtes Gitterwerk, durch ein entfernbar an die 
Oberflache angebrachtes Gitterwerk, durch Vertie- 
fungen in dem festen Trager Oder durch ein Gel ge- 
trennt sind, das Bereiche korperiich dadurch trennt, 
dass es Nucleinsauren in einer in einen Bereich ein- 
gebrachten wassrigen Losung daran hindert, in ei- 
nen anderen Bereich zu wandern. 

8. Sektionierte Oligonucleotidanordnung nach einem 
der vorhergehenden Anspruche, umfassend des 
Weiteren eine Bedeckung, die entfernbar an den fe- 
sten Trager anbringbar ist. 

9. Anordnung nach Anspruch 8, wobei die Bedeckung 
ein Material umfasst, auf das Nucleinsaurestrange 
geblottet werden konnen. 

10. Verfahren fur das Sortieren einer Mischung von Nu- 
clernsaurestrangen, umfassend die Schritte des: 

a) Bereitstellens einer Losung enthaltend eine 
Mischung von Nucleinsaurestrangen in einst- 
rangiger Form, 

b) Bereitstellens einer ersten binaren Oligonu- 
cleotidanordnung vorbestimmter Bereiche auf 
einer Oberflache eines festen Tragers, wobei 
jeder Bereich darin kovalent an die Oberflache 
gebundene Kopien eines binaren Oligonucleo- 
tids aufweist, die aus einer konstanten Se- 
quenz von Basenpaarungsnucleotiden beste- 
hen, die an eine variable Sequenz von Basen- 
paarungsnucleotiden anliegen, und wobei die 
konstante Nucleotidsequenz fur alle Oligonu- 
cleotide in der Anordnung die gleiche ist, 

c) Kontaktierens der Losung mit einer ersten bi- 
naren Oligonucleotidanordnung, und 

d) Hybridisierens der Nucleinsaurestrang an 
die binaren Oligonucleotide in der Anordnung 
unter Bedingungen, die geniigend hart sind, 
urn Hybride der Lange der immobilisierten Oli- 
gonucleotide, jedoch keine kurzeren Hybride, 
zu begunstigen. 

11. Verfahren nach Anspruch 10, wobei die erstebinare 
Oligonucleotidanordnung eine sektionierte Anord- 
nung nach einem der Anspruche 1-9 ist. 

1 2. Verfahren nach Anspruch 1 0 Oder Anspruch 1 1 , wo- 
bei die Nucleinsaurestrange eine gemeinsame ter- 
minale Restriktionsstelle aufweisen, die zu der kon- 



stanten Sequenz komplementar ist. 

13. Verfahren fur das Sortieren terminal gestutzter Par- 
tialkopien von mindestens einem Nucleinsaure- 
strang durch ihre variablen Termini unter Zuhilfe- 
nahme einer Anordnung von immobilisierten bina- 
ren Oligonucleotiden mit einer konstanten Sequenz 
von mindestens drei Nucleotiden, die an eine varia- 
ble Sequenz von mindestens drei Nucleotiden an- 
liegen, umfassend: 

a) das Hybridisieren, an die immobilisierten Oli- 
gonucleotide, eines maskierenden Oligonu- 
cleotids, das entweder zu der konstanten Se- 
quenz oderzu einem Teil derselben, das an der 
variablen Sequenz anliegt, komplementar ist; 

b) das Hybridisieren der Partialkopien an die 
Anordnung unter Bedingungen, die die Bildung 
von Hybriden der Lange der variablen Se- 
quenz, jedoch keine kurzeren Langen fordern; 

c) das Binden der maskierenden Oligonucleo- 
tide an die Partialkopien, die an die variable Se- 
quenz durch ihre variablen Termini hybridisiert 
worden sind; und 

d) das Erhohen der Stringens der Hybridisie- 
rungsbedingungen, urn Hybride zu entfernen, 
die kurzer sind als die kombinierte Lange des 
maskierenden Oligonucleotids und der varia- 
blen Sequenz. 

14. Verfahren fur das Sequenzieren des Oligonucleo- 
tidgehalts eines Nucleinsaurestrangs unter Zuhilfe- 
nahme einer umfassenden Anordnung von binaren 
immobilisierten Oligonucleotiden, die alle mogli- 
chen variablen Sequenzen einer vorgegebenen 
Lange von drei bis acht Nucleotiden enthalten, um- 
fassend 

a) das Zubereiten eines vollstandigen Satzes 
terminal gestutzter Kopien des Strangs; 

b) das terminale Sortieren der Kopien in Grup- 
pen mit gemeinsamen variablen Enden unter 
Zuhilfenahme einer binaren Anordnung durch 
das Verfahren von Anspruch 13; 

c) das Uberprufen des Oligonucleotidgehalts 
jeder Gruppe durch Hybridisieren derselben an 
die umfassende Anordnung; und 

d) das Bestimmen, wo die Hybridisierung an die 
Anordnung stattgefunden hat. 

15. Verfahren fur das Erhalten von Informationen zum 
Zuweisen sequenzierter und geordneter Fragmen- 
te aus Schwesterchromosomen zu den chromoso- 
malen Verknupfungsgruppen, umfassend 

a) das Zubereiten eines Restriktionsdigesti- 
vums, das sich von irgendeinem zum Sequen- 
zieren und Ordnen der Fragmente verwende- 
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ten Digestivum unterscheidet, unter Bildung 
von Fragmenten, die die Verbindungsstellen 
zwischen den geordneten Fragmenten uber- 
brucken; 

b) das terminale Sortieren der Fragmente unter 
Zuhilfenahme einer binaren Oligonucleotidan- 
ordnung nach dem Verfahren von Anspruch 1 2; 

c) das Zubereiten terminal gestutzter Partialko- 
pien der sortierten Fragmente in einzelnen Ver- 
tiefungen der binaren Oligonucleotidanord- 
nung durch ein Verfahren umfassend: 

(i) das Hybridisieren des Strangs an die 
Anordnung durch ein Oligonucleotidseg- 
ment, das in dem Strang enthalten ist, wo- 
bei die Anordnung vorbestimmte Bereiche 
auf einer Oberflache eines festen Tragers 
umfasst, wobei jeder Bereich darin immo- 
bilisierte Oligonucleotide aufweist beste- 
hend aus einer vorbestimmten variablen 
Sequenz, wobei die Hybridisierung unter 
Bedingungen stattfindet, die die Bildung 
von Hybriden der Lange des immobilisier- 
ten Oligonucleotids in jedem Bereich, je- 
doch keiner kurzeren Hybride begunstigt, 
und 

(ii) wobei der Strang zu einer 3'-Anordnung 
hybridisiert wird, durch enzymatisches 
Verlangem des immobilisierten Oligonu- 
cleotids unter Zuhilfenahme des hybridi- 
sierten Strangs als Matrize, und wobei der 
Strang zu einer 5'-Anordnung hybridisiert 
wird, unter Hybridisieren eines Primers an 
die im 3-Terminus des hybridisierten 
Strangs enthaltene Primerregion, darauf- 
folgendes enzymatisches Verlangern des 
Primers unter Bildung eines Verlange- 
rungsprodukts und Binden des Verlange- 
rungsprodukt mit dem immobilisierten Oli- 
gonucleotid; 

d) Hybridisieren der Partialkopien an die Anord- 
nung alter variablen Nucleotide einer vorgege- 
benen Lange; und 

e) Bestimmen, wo die Hybridisierung auf letz- 
terer Anordnung stattgefunden hat. 

16. Verfahren fur das Uberprufen von Oligonucleotiden 
in einem Nucleinsaurestrang, umfassend 

a) das wahllose Abbauen des Strangs in Stiik- 
ke, die so kurzwie moglich sind, deren durch- 
schnittliche Lange die Lange der zu uberwa- 
chenden Oligonucleotide jedoch urn minde- 
stens ein Nucleotid ubersteigt, durch Hybridi- 
sieren an variable Sequenzen binarer Oligonu- 
cleotide; 

b) Binden der Stucke an ein bindendes Oh- 
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gonucleotid, das zu mindestens einem Teil der 
konstanten Sequenz immobilisierter Oligonu- 
cleotide komplementar ist, in einer binaren An- 
ordnung nach Anspruch 1 ; 

c) Hybridisieren der Stucke an die binare An- 
ordnung, wobei die binare Anordnung immobi- 
lisierte Oligonucleotide in einer geordneten An- 
ordnung darin aufweist und aus einer konstan- 
ten Sequenz besteht, die an einer variablen Se- 
quenz anliegt, wobei die immobilisierten Oli- 
gonucleotide in einem einzelnen Bereich der 
Anordnung die gleiche Sequenz aufweist; und 

d) Bestimmen der gebildeten Hybride. 



Revendications 

1. Matrice a oligonucleotides binaires compartimen- 
tee comprenant une matrice de zones predetermi- 
nees sur une surface d'un support solide, dans la- 
quelle lesdites zones sont separees physiquement 
les unes des autres en compartiments, de telle sor- 
te que des acides nucleiques dans une solution 
aqueuse generes dans un compartiment ne puis- 
sent pas migrer vers un autre compartiment, cha- 
que zone comportant, liees de facon covalente a 
ladite surface, de multiples copies d'un oligonucleo- 
tide binaire d'une sequence predeterminee, led it oli- 
gonucleotide binaire etant constitue d'une sequen- 
ce constante de nucleotides d'appariement de ba- 
ses adjacente a une sequence variable de nucleo- 
tides d'appariement de bases et la sequence cons- 
tante etant la meme pour tous les oligonucleotides 
de la matrice. 

2. Matrice selon la revendication 1, dans laquelle au 
moins Tun des nucleotides dans lesdits oligonucleo- 
tides binaires est un nucleotide modifie. 
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Matrice selon la revendication 1 ou 2, qui contient 
toutes les sequences variables possibles d'une lon- 
gueur donnee de trois a huit nucleotides. 

Matrice selon Tune quelconque des revendications 
precedentes, dans laquelle les oligonucleotides bi- 
naires dans chaque zone ont des sequences varia- 
bles de meme longueur. 

Matrice selon Tune quelconque des revendications 
precedentes, dans laquelle les oligonucleotides bi- 
naires ont des extremites 3' libres et dans laquelle 
les oligonucleotides binaires ont leurs sequences 
constantes adjacentes a I'extremite 5' de leurs se- 
quences variables. 

Matrice a oligonucleotides binaires compartimen- 
tee selon Tune quelconque des revendications pre- 
cedentes, dans laquelle ladite sequence constante 
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de nucleotides comprend une ou plusieurs sequen- 
ces fonctionnelles choisies parmi une region 
d'amorgage par la polymerase de I'acide nucleique, 
une region de promoteur de TARN polymerase et 
un site de reconnaissance d'une endonuclease de 5 
restriction. 

7. Matrice a oligonucleotides compartimentee selon 
Tune quelconque des revendications precedentes, 
dans laquelle lesdits compartiments sont physique- 10 
ment separes par une grille attachee a ladite surfa- 
ce, par une grille pouvant etre fixee de fagon de- 
montable a ladite surface, par des puits presents 
dans ledit support solide, ou par un gel qui separe 
physiquement lesdites zones, en empechant que is 
des acides nucleiques d'une solution aqueuse pla- 
cee dans une zone ne migrent vers une autre zone. 

8. Matrice a oligonucleotides compartimentee selon 
Tune quelconque des revendications precedentes, 20 
comprenant en outre un couvercle pouvant etre fixe 

de fagon demontable audit support solide. 

9. Matrice selon la revendication 8, dans laquelle ledit 
couvercle comprend une matiere sur laquelle des 25 
brins d'acide nucleique peuvent etre transferes. 

1 0. Procede de tri d'un melange de brins d'acide nuclei- 
que comprenant les etapes consistant a : 

30 

a) se procurer une solution contenant un me- 
lange de brins d'acides nucleiques sous forme 
de simple brin, 

b) se procurer une premiere matrice d'oligonu- 
cleotides binaires a zones predetermines sur 35 
une surface d'un support solide, chaque zone 
contenant, liees de facon covalente a ladite 
surface, des copies d'un oligonucleotide binai- 

re constitue d'une sequence constante de nu- 
cleotides d'appariement de base adjacente a *o 
une sequence variable de nucleotides d'appa- 
riement de bases, la sequence constante de 
nucleotides etant la meme pour tous les oligo- 
nucleotides de la matrice, 

c) mettre en contact ladite solution avec ladite 45 
premiere matrice d'oligonucleotides binaires, 

et 

d) hybrider lesdits brins d'acide nucleique avec 
les oligonucleotides binaires dans ladite matri- 
ce dans des conditions suffisamment stringen- 50 
tes pour favoriser la formation d'hybrides de la 
longueur des oligonucleotides immobilises 
mais non d'hybrides plus courts. 

11. Procede selon la revendication 10, dans lequel la- 55 
dite premiere matrice d'oligonucleotides binaires 

est une matrice compartimentee selon I'une quel- 
conque des revendications 1 a 9. 
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12. Procede selon la revendication 10 ou 11, dans le- 
quel lesdits brins d'acide nucleique ont un site de 
restriction terminal commun qui est complementai- 
re de ladite sequence constante. 

13. Procede pour trier des copies partielles tronquees 
aux extremites d'au moins un brin d'acide nucleique 
par leurs extremites variables en utilisant une ma- 
trice d'oligonucleotides binaires immobilises ayant 
une sequence constante d'au moins trois nucleoti- 
des adjacente a une sequence variable d'au moins 
trois nucleotides, comprenant : 

a) I'hybridation avec lesdits oligonucleotides 
immobilises d'un oligonucleotide de masquage 
qui est complementaire soit de ladite sequence 
constante, soit d'une partie de celle-ci qui est 
adjacente a ladite sequence variable ; 

b) I'hybridation des copies partielles avec la 
matrice dans des conditions qui favorisent la 
formation d'hybrides de la longueur de ladite 
sequence variable mais non de longueurs plus 
courtes ; 

c) la ligature des oligonucleotides de masqua- 
ge aux copies partielles qui se sont hybridees 
avec ladite sequence variable par leurs extre- 
mites variables ; et 

d) Paugmentation de la stringence des condi- 
tions d'hybridation pour eliminer des hybrides 
plus courts que la longueur combinee de ('oli- 
gonucleotide de masquage et de la sequence 
variable. 

14. Procede pour sequencer lecontenu en oligonucleo- 
tides d'un brin d'acide nucleique en utilisant une 
matrice complete d'oligonucleotides binaires immo- 
bilises contenant toutes les sequences variables 
possibles d'une longueur donnee de trois a huit nu- 
cleotides, comprenant : 

a) la preparation d'un jeu complet de copies 
tronquees aux extremites dudit brin ; 

b) le tri par extremites desdites copies en des 
groupes ayant des extremites variables com- 
munes en utilisant une matrice binaire par le 
procede de la revendication 13 ; 

c) I'etude du contenu en oligonucleotides de 
chaque groupe en I'hybridant avec ladite matri- 
ce complete ; et 

d) la detection de I'endroit ou I'hybridation avec 
la matrice a eu lieu. 

15. Procede pour obtenir des informations pour attri- 
buer des fragments sequences et ordonnes prove- 
nant de chromosomes freres a des groupes de 
liaison chromosomique, comprenant : 

a) la preparation d'une digestion par des enzy- 
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mes de restriction differente d'une quelconque 
digestion utilisee pour sequencer et ordonner 
lesdits fragments, en produisant ainsi des frag- 
ments dont la portee comprend les jonctions 
entre lesdits fragments ordonnes ; 5 

b) le tri par extremites desdits fragments en uti- 
lisant une matrice d'oligonucleotides binaires 
selon le procede de la revendication 12 ; 

c) la preparation de copies partielles tronquees 
aux extremites des fragments tries dans des 10 
puits individuels de ladite matrice d'oligonu- 
cleotides binaires par un procede comprenant : 

(i) I'hybridation du brin avec la matrice par 

un segment d'oligonucleotide contenu 15 
dans le brin, ladite matrice comprenant des 
zones predetermines sur une surface 
d'un support solide, chaque zone conte- 
nant des oligonucleotides immobilises 
constitues d'une sequence variable prede- 20 
terminee, ladite hybridation ayant lieu dans 
des conditions qui favorisent la formation 
d'hybrides de la longueur de roligonucleo- 
tide immobilise dans chaque zone mais 
non d'hybrides plus courts, et 25 

(ii) la ou le brin est hybride a une matrice 
3', Pextension de facon enzymatique de 
I'oligonucleotide immobilise en utilisant le 
brin hybride comme une matrice, et la ou 

le brin est hybride a une matrice 5\ I'hybri- 30 
dation d'une amorce avec une region 
d'amorcage contenue dans I'extremite 3' 
du brin hybride, puis I'extension de facon 
enzymatique de I'amorce pour former un 
produit d'extension et la ligature du produit 35 
d'extension a I'oligonucleotide immobilise; 

d) I'hybridation desdites copies partielles avec 
une matrice de tous les nucleotides variables 
d'une longueur donnee; et w 

e) la detection de I'endroit ou I'hybridation a eu 
lieu sur cette derniere matrice. 

16. Procede pour etudier les oligonucleotides dans un 
brin d'acide nucleique, comprenant 45 

a) la degradation aleatoire du brin en morceaux 
qui soient aussi courts que possible mais dont 
la longueur moyenne depasse d'au moins un 
nucleotide la longueur des oligonucleotides a 50 
etudier par hybridation avec des sequences va- 
riables d'oligonucleotides binaires ; 

b) la ligature des morceaux a un oligonucleoti- 
de de ligature complementaire d'au moins une 
partie d'une sequence constante d'oligonucleo- 55 
tides immobilises dans une matrice binaire se- 
lon la revendication 1 ; 

c) I'hybridation des morceaux avec la matrice 



binaire, ladite matrice binaire comprenant des 
oligonucleotides immobilises dans une matrice 
ordonnee, constitues d'une sequence constan- 
te adjacente a une sequence variable, les oli- 
gonucleotides immobilises dans une zone indi- 
viduelle de la matrice ayant la meme 
sequence ; et 

d) la detection des hybrides formes. 
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Figure 2 - Sectioned Array 
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Figure 3 
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Undigested DNA 



DNA digestion with 
a restriction endonuclease 
and repair of fragment ends 

Melting DNA strands apart 




bridizatlon of DNA strands 
to a sorting array 



i 



xtenslon of the Immobilized primer 



i 



PCR amplification 




10 



* 11 

12 

13a 



IS 



Figure 4 
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Figure 5 
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Sorting the 
fragments 
into groups 
of strands 




(Sequence of the diploid genome) 



Figure 6 
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Unindexed address sets 



ACC 


ATO 


CAT 


CCT 


CTA 


CTT 


QGT 


GTA 


OTC 


TAA 


TAC 


TCC 


TOG 


TTO 


AOC 


ATG 


CAT 


CCT 


CTA 


err 


OGT 


GTA 


OTC 


TAA 


TAC 


TCC 


TOO 


TTG 


ACC 


ATO 


CAT 


CCT 




CTT 


OQT 


OTA 




TAA 


TAC 




TOO 


TTG 


AOC 


ATG 


CAT 


CCT 


CTA 


CTT 


OGT 


OTA 


OTC 


TAA 


TAC 


TCC 


TGO 


TTG 


ACC 


ATO 




CCT 


CTA 


CTT 


GOT 


OTA 


OTC 




TAC 


TCC 


TGO 


TTG 


ACC 


ATQ 


CAT 


CCT 


CTA 


CTT 


GOT 


GTA 


OTC 


TAA 


TAC 


TCC 


TOO 


TTG 


ACC 


ATO 


CAT 


CCT 


CTA 


CTT 


GOT 


GTA 


OTC 


TAA 


TAC 


TCC 


TGO 


TTG 


ACC 


ATG 


CAT 


CCT 


CTA 


CTT 


GOT 


GTA 


OTC 


TAA 


TAC 


TCC 


TGG 


TTO 


ACC 


ATQ 




CCT 


CTA 


CTT 


GGT 


OTA 


OTC 




TAC 


TCC 


TOO 


TTG 


ACC 


ATG 


CAT 


CCT 




CTT 


GOT 


OTA 




TAA 


TAC 




TGG 


TTG 


ACC 


ATQ 


CAT 


CCT 


CTA 


CTT 


GGT 


GTA 


OTC 


TAA 


TAC 


TCC 


TGG 


TTG 


ACC 


ATQ 




CCT 


CTA 


CTT 


GGT 


OTA 


OTC 




TAC 


TCC 


TGG 


TTO 


ACC 


ATQ 


CAT 


CCT 


CTA 


CTT 


GGT 


GTA 


OTC 


TAA 


TAC 


TCC 


TOO 


TTO 


ACC 


ATG 


CAT 


CCT 


CTA 


CTT 


GGT 


GTA 


OTC 


TAA 


TAC 


TCC 


TOG 


TTO 



ACC 
ATG 
CAT 
CCT 
CTA 
CTT 
GGT 
GTA 
GTC 
TAA 
TAC 
TCC 
TGG 
TTG 



Grouped address sets 



AOC 


ATG 


CAT 


CCT 




CTT 


GGT 


GTA 




TAA 


TAC 




TGG 


TTG 


ACC 


ATG 


CAT 


CCT 




CTT 


QGT 


GTA 




TAA 


TAC 




TGG 


TTG 


AOC 


ATG 




CCT 


CTA 


CTT 


QGT 


GTA 


OTC 




TAC 


TCC 


TGG 


TTG 


ACC 


ATG 




CCT 


CTA 


CTT 


GGT 


OTA 


GTC 




TAC 


TCC 


TGG 


TTO 


ACC 


ATG 




CCT 


CTA 


CTT 


OGT 


OTA 


GTC 




TAC 


TCC 


TGG 


TTG 


ACC 


ATG 


CAT 


CCT 


CTA 


err 


QGT 


OTA 


GTC 


TAA 


TAC 


TCC 


TGG 


TTG 


AOC 


ATG 


CAT 


CCT 


CTA 


CTT 


GGT 


GTA 


OTC 


TAA 


TAC 


TCC 


TGG 


TTG 


AOC 


ATG 


CAT 


CCT 


CTA 


CTT 


GGT 


GTA 


OTC 


TAA 


TAC 


TCC 


TGG 


TTG 


ACC 


ATG 


CAT 


CCT 


CTA 


CTT 


GGT 


GTA 


OTC 


TAA 


TAC 


TCC 


TGG 


TTG 


ACC 


ATG 


CAT 


CCT 


CTA 


CTT 


QGT 


GTA 


GTC 


TAA 


TAC 


TCC 


TGG 


TTQ 


ACC 


ATG 


CAT 


CCT 


CTA 


CTT 


GGT 


GTA 


GTC 


TAA 


TAC 


TCC 


TGG 


TTG 


ACC 


ATG 


CAT 


CCT 


CTA 


CTT 


GGT 


OTA 


OTC 


TAA 


TAC 


TCC 


TGG 


TTG 


AOC 


ATG 


CAT 


CCT 


CTA 


CTT 


OGT 


OTA 


GTC 


TAA 


TAC 


TCC 


TGG 


TTG 


AOC 


ATG 


CAT 


CCT 


CTA 


CTT 


GGT 


OTA 


GTC 


TAA 


TAC 


TCC 


TGG 


TTG 



CAT 
TAA 

CTA 
GTC 
TCC 

ACC 
ATG 
CCT 
CTT 
GGT 
GTA 
TAC 
TGG 
TTG 
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Identified strand sets 



A: ACC ATO CAT CCT CTT CCT GTA TAA TAC TCG TTG 



CAT 
TAA 

ACC 
ATG 
CCT 
CTT 
GCT 
GTA 
TAC 
TGC 
TTG 



ACC 


ATC 


CAT 


CCT 




CTT 


OOT OTA 


TAA 


TAC 




TOO 


TTO 


ACC 


ATO 


CAT 


CCT 




CTT 


CCT OTA 


TAA 


TAC 




TOO 


TTO 


AOC 


ATO 


CAT 


CCT 


CTA 


CTT 


OOT OTA 


OTC TAA 


TAC 


TCC 


TOO 


TTO 


ACC 


ATC 


CAT 


CCT 


CTA 


CTT 


OOT OTA 


OTC TAA 


TAC 


TCC 


TOO 


TTO 


AOC 


ATO 


CAT 


CCT 


CTA 


CTT 


OOT GTA 


OTC TAA 


TAC 


TCC 


TOO 


TTO 


AOC 


ATO 


CAT 


CCT 


CTA 


CTT 


OOT OTA 


OTC TAA 


TAC 


TCC 


TOO 


TTO 


ACC 


ATO 


CAT 


CCT 


CTA 


CTT 


OOT OTA 
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TAC 


TCC 


TOO 


TTO 


ACC 


ATO 


CAT 


CCT 


CTA 


CTT 


OOT OTA 


OTC TAA 


TAC 


TCC 


TOO 


TTG 


ACC 


ATO 


CAT 


CCT 


CTA 


CTT 


OCT OTA 


OTC TAA 


TAC 


TCC 


TGO 


TTO 


ACC 


ATO 


CAT 


CCT 


CTA 


CTT 


COT OTA 


OTC TAA 


TAC 


TCC 


TOO 


TTO 


ACC 


ATO 


CAT 


CCT 


CTA 


CTT 


OCT OTA 


OTC TAA 


TAC 


TCC 


TOO 


TTO 



Figure 9a 



B: ACC ATG CCT CTA CTT CGT OTA GTC TAC TCC TGG TTG 



CTA 


ACC 


ATO 




CCT 


CTA 


CTT 


OCT OTA 


OTC 




TAC 


TCC 


TOO 


TTO 


GTC 


AOC 


ATG 




CCT 


CTA 


CTT 


OOT CTA 


OTC 




TAC 


TCC 


TGO 


TTG 


TCC 


ACC 


ATO 




CCT 


CTA 


CTT 


OOT OTA 


OTC 




TAC 


TCC 


TGO 


TTO 


ACC 


AOC 


ATC 


CAT 


CCT 


CTA 


CTT 


OOT GTA 


OTC 


TAA 


TAC 


TCC 


TOO 


TTO 


AT G 


AOC 


ATG 


CAT 


CCT 
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CTT 


OOT OTA 


GTC 


TAA 


TAC 


TCC 


TOG 


TTO 


CCT 


AOC 


ATO 


CAT 


CCT 


CTA 


CTT 


OOT OTA 


OTC 


TAA 


TAC 


TCC 


TGG 


TTO 


CTT 


ACC 


ATO 


CAT 


CCT 


CTA 


CTT 


OOT GTA 


OTC 


TAA 


TAC 


TCC 


TOO 


TTO 


GGT 


AOC 


ATG 


CAT 


CCT 


CTA 


CTT 


OCT GTA. 


OTC 


TAA 


TAC 


TCC 


TOG 


TTO 


GTA 


AOC 


ATG 


CAT 


CCT 


CTA 


CTT 


COT OTA 


OTC 


TAA 


TAC 


TCC 


TOO 


TTG 


TAC 


AOC 


ATG 


CAT 


CCT 


CTA 


CTT 


OCT OTA 


OTC 


TAA 


TAC 


TCC 


TOO 


TTO 


TGG 
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ATG 


CAT 


CCT 


CTA 


CTT 


GOT OTA 


OTC 
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TOG 


TTO 


TTG 
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ATG 
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OTC 
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TAC 


TCC 


TOO 


TTO 



Figure 9b 
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